Solutions Implementation FAQ

Q: What can the data lake solution manage on my behalf?

A:The solution manages a persistent catalog of organizational datasets in Amazon S3 and business-relevant tags associated with each dataset. It allows companies to create simple governance policies that require specific tags when datasets are stored in the data lake solution.

Q: What type of datasets does the data lake solution support?

A: You can register existing or new datasets of any file type or size because the solution leverages the flexibility of Amazon S3.

Q: How do I upload my data to the data lake?

A: You can upload data files from the data lake solution console, or directly to an Amazon S3 bucket and then register them in the data lake.

Q: Can I use the data lake if I have existing data in Amazon S3?

A: Yes. You can register datasets with descriptive tags of your choice that point to existing objects in Amazon S3.

Q: How can I monitor the data lake?

A: The data lake logs API calls, latency, and error rates to Amazon CloudWatch in your AWS account. Additionally, you can turn on audit logging for your data lake deployment to monitor all user activity for compliance tracking.

Q: How quickly are solution logs available?

A: Logs, alarms, error rates and other metrics are stored in Amazon CloudWatch and are available near real-time.

Q: How do I add and manage users in the data lake solution?

A: After the data lake solution is deployed, you can invite users to self-register to start using the data lake. You can continue to manage users, groups, and permissions to the data lake in the Administration section of the solution console.

Q: How is data transmitted to the data lake?

A: You have several options to add data to the data lake solution: use the data lake console or data lake CLI to upload files, or link to existing content in Amazon S3.

Q: Can I deploy the data lake solution in any AWS Region?

A: You can deploy the solution’s AWS CloudFormation template only in AWS Regions where Amazon Cognito, Amazon Athena, and AWS Glue are available. However, once deployed, you can invite users from around the globe to access the solution.

In addition to service-availability requirements, we recommend you deploy the data lake in the AWS Region where your data is stored for better performance and user interactivity.

Training and Certification

AWS Training and Certification builds your competence, confidence, and credibility through practical cloud skills that help you innovate and build your future.  Learn more »

Big Data on AWS

Big Data on AWS introduces you to cloud-based big data solutions such as Amazon Elastic MapReduce (EMR), Amazon Redshift, Amazon Kinesis and the rest of the AWS big data platform.

Enroll now »

Building a Serverless Data Lake on AWS

In this one-day, advanced course, you will learn to design, build, and operate a serverless data lake solution with AWS services.

Enroll now »

AWS Certified Developer – Associate

This exam validates proficiency in developing, deploying, and debugging cloud-based applications using AWS.

Schedule your exam »

Partner resources

The AWS Partner Network (APN) is focused on helping partners build successful AWS-based businesses to drive superb solutions and customer experiences. APN Partners are focused on customer success, helping you take full advantage of all the business benefits that AWS has to offer. With their deep expertise on AWS, APN Partners are uniquely positioned to help your company at any stage of your Cloud Adoption Journey and to help you solve some of your most complex problems.

Visit the following pages to learn more about the services we used to build this AWS Solution.

Need more resources to get started with AWS?

Visit the Getting Started Resource Center to find tutorials, projects and videos to get started with AWS.

Learn more »