AWS Partner Network (APN) Blog

Data Governance Across AWS Organizations for Security and Compliance

By Ryo Hang, Sr. Solution Architect – ASCENDING
By Celeste Shao, Sr. Data Engineer – ASCENDING
By Gloria Zhang, Director, Cloud Projects – ASCENDING
By Shaun Wang, Sr. Partner Solutions Architect – AWS

ASCENDING-AWS-Partners-2022
ASCENDING
Connect with ASCENDING-1

Data governance serves an important role in ensuring the quality, consistency, and security of data utilized across an organization. It’s critical for data teams to have improved visibility and audit-ability of user data access patterns across their organization.

Common challenges in a data governance system include data permission control, data security, compliance, onboarding processes, and the inhibition of productivity by data silos.

On top of that, organizations have to deal with workload independence and isolation, as it’s not ideal for large companies consisting of more than one development team to share the same environment. With resource independence and isolation achieved, each team would only have access to the pieces of data they are granted in order to ensure data security and compliance.

While it’s traditionally difficult to manage credentials and accesses for a large organization, Amazon Web Services (AWS) takes care of that with a multi-account organization structure to isolate workloads or departments.

Building a data governance system on Amazon Web Services (AWS) enables users to achieve resource independence and isolation with ease, meaning users of other accounts do not have access to the resources by default.

Using a multi-account structure with cross-account access is an AWS best practice that offers several other benefits:

  • Flexible security controls and scalability.
  • Easily monitors account compliance.
  • Data self-service.
  • Simplify the data onboarding process.
  • Reduction of blast radius.

In this post, we will discuss setting up a data governance system in AWS Organization accounts with clients’ use cases and solutions, and how ASCENDING overcame the technical challenges listed above. ASCENDING is an AWS Advanced Tier Services Partner with Competencies in DevOps and Data and Analytics.

Needs of the Customer

A fast-growing logistics company, OTR Transportation Inc. came to ASCENDING for help on data governance and analytics reporting. Here’s what they were looking for support with:

  • Data governance and management:
    • With the growing customer base, the client aimed to have centralized data storage for the huge amount of data coming from multiple ongoing data acquisition processes.
    • The client has multiple software vendors and needed to improve workload isolation and data isolation to achieve security standards.
    • The client has data in different formats stored in various places and wanted to eliminate data silos while maintaining a scalable data infrastructure.
  • Embedded data analytics and reporting:
    • The client needed an easier and more streamlined way to analyze data so they could make better decisions on rates and potentially improve profit margin through useful market insights.
    • The client liked to embed analytics into different custom dashboards.

Building a Centralized Data Lake from Multiple Accounts

The ASCENDING team worked with AWS solutions architects and figured out the best-fit solution for the client. The team came up with a data mesh architecture with AWS Lake Formation as a core service.

Data lakes are foundational for making sense of data at an organizational level. They remove data silos, making it easier to analyze diverse datasets, while keeping data secure and incorporating machine learning.

Data Lake Storage on AWS provides a centralized repository that can store both the structured and unstructured data at scale. It allows users to store data without having to first structure the data, and run a variety of analytics.

Traditionally, it takes months to build a data lake as it requires heavy collaboration among the different teams within the organization. AWS Lake Formation easily creates secure data lakes, making data available for wide-ranging analytics and drastically simplifies and shortens the process of building a data lake.

Ascending-Governance-Organizations-1.1

Figure 1 – How AWS Lake Formation can help.

ASCENDING Solution

ASCENDING first built a data lake from AWS Lake Formation, and then used different data ingestion services such as Amazon Kinesis, AWS Glue, and AWS Database Migration Service (AWS DMS) to extract data from other accounts in the same organization.

In order to walk through the structure, we have highlighted a partial architecture with the following diagram and example.

Ascending-Governance-Organizations-2

Figure 2 – Data governance in a typical AWS Organization accounts.

In order to isolate workload/data and follow the security best practice, clients typically have multiple accounts in AWS Organizations. We will use the following three accounts as an example to walk through our solution:

  • Account #1 – Data Source Account
  • Account #2 – Lake Formation Central Account
  • Account #3 – Consumption Account

Since the data sources are generated from different teams, we use the Account #1 data source account as example. The client wanted to manage all of the data in a centralized AWS account, which is in Account #2. AWS Glue and Amazon Kinesis are two services ASCENDING applied to transfer data from third-party data sources, NoSQL, and relational databases.

All of those services processed data to the source Amazon Simple Storage Service (Amazon S3) buckets which are working as the data source layer. We then created S3 policies to enable cross-account access, and the data will be securely transformed and enriched in AWS Glue jobs to the data lake account.

Once all of the data reaches “Landing Bucket,” we catalog that in the data lake and grant access to downstream applications like Amazon Athena or Amazon QuickSight for further analyses.

Achievements

Data lake infrastructure was built for the client according to their business requirement to help overcome big data challenges such as data governance, scalable data infrastructure, and rapid data acquisition.

The client used the data lake as a centralized and secure place to store data from both internal and external sources while allowing access to everyone in the organization with proper permissions and constraints. The system allowed the client to cut out unnecessary work and focus on making useful business value out of data.

The client was initially looking for data governance solution on the cloud. The data team from ASCENDING explored different use cases and designed the data lake. This solution helped the client recognize different data source and acquire new data sources more efficiently.

Lesson Learned

Data Replication

Data replication strategies to the data lake central account may vary. We always looked at the data consumption pattern to form the replication strategy. For example:

  • If the data consumption app focuses on the state of the data, we replicated the entire data source like Amazon Aurora once per week or month to the central account.
  • If the data consumption app cares about the changeset in real-time, we utilized Amazon Kinesis to live-stream the record changes.

Data Ingestion

ASCENDING used AWS Lake Formation, but you are free to use a different service to ingest data into the S3 bucket and catalog in Lake Formation, such as AWS Lambda, AWS Batch, or Amazon Kinesis.

  • You can scale up AWS Glue workers according to the available IP addresses in the subnet, so always make sure you assign enough IPs in Glue subnets.
  • Set up the necessary monitoring and alert around the AWS Glue job for cost optimization purposes. Monitoring and alerts can prevent you from paying unnecessary long-running Glue jobs, which can be expensive.
  • AWS Lake Formation is a fast-evolved service:
    • At the time this solution was developed, we were only able to control the permission as row-based level, but now the admin can control column-based as well as row-based data permission.
    • All the applications and workloads can now read data from the data lake, so the client no longer needs to grant access between applications. This is more secure and it’s easy to manage the access.
    • Client values the self-service and permission control aspect as it saved lot of effort for operations as well as auditing.

Conclusion

In this post we talked about challenges of data governance in AWS Organizations. We used an ASCENDING client use case to walk through how the team tackled challenges by combining various AWS services such as AWS Lake Formation and AWS Glue.

If you have unique data and analytics challenge in your organization, connect with ASCENDING. Here are some relevant hands-on videos to help you learn more:

.
ASCENDING-APN-Blog-Connect-2022
.


ASCENDING – AWS Partner Spotlight

ASCENDING is an AWS Competency Partner that provides cloud migration, DevOps, and application development services to enterprise customers.

Contact ASCENDING | Partner Overview | AWS Marketplace