AWS Lake Formation is a service that makes it easy to set up a secure data lake in days. A data lake is a centralized, curated, and secured repository that stores all your data, both in its original form and prepared for analysis. A data lake enables you to break down data silos and combine different types of analytics to gain insights and guide better business decisions.
However, setting up and managing data lakes today involves a lot of manual, complicated, and time-consuming tasks. This work includes loading data from diverse sources, monitoring those data flows, setting up partitions, turning on encryption and managing keys, defining transformation jobs and monitoring their operation, re-organizing data into a columnar format, configuring access control settings, deduplicating redundant data, matching linked records, granting access to data sets, and auditing access over time.
Creating a data lake with Lake Formation is as simple as defining data sources and what data access and security policies you want to apply. Lake Formation then helps you collect and catalog data from databases and object storage, move the data into your new Amazon S3 data lake, clean and classify your data using machine learning algorithms, and secure access to your sensitive data. Your users can access a centralized data catalog which describes available data sets and their appropriate usage. Your users then leverage these data sets with their choice of analytics and machine learning services, like Amazon Redshift, Amazon Athena, and (in beta) Amazon EMR for Apache Spark. Lake Formation builds on the capabilities available in AWS Glue.
Build data lakes quickly
With Lake Formation, you can move, store, catalog, and clean your data faster. You simply point Lake Formation at your data sources, and Lake Formation crawls those sources and moves the data into your new Amazon S3 data lake. Lake Formation organizes data in S3 around frequently used query terms and into right-sized chunks to increase efficiency. Lake Formation also changes data into formats like Apache Parquet and ORC for faster analytics. In addition, Lake Formation has built-in machine learning to deduplicate and find matching records (two entries that refer to the same thing) to increase data quality.
Simplify security management
You can use Lake Formation to centrally define security, governance, and auditing policies in one place, versus doing these tasks per service, and then enforce those policies for your users across their analytics applications. Your policies are consistently implemented, eliminating the need to manually configure them across security services like AWS Identity and Access Management and AWS Key Management Service, storage services like S3, and analytics and machine learning services like Redshift, Athena, and (in beta) EMR for Apache Spark. This reduces the effort in configuring policies across services and provides consistent enforcement and compliance.
Provide self-service access to data
With Lake Formation you build a data catalog that describes the different data sets that are available along with which groups of users have access to each. This makes your users more productive by helping them find the right data set to analyze. By providing a catalog of your data with consistent security enforcement, Lake Formation makes it easier for your analysts and data scientists to use their preferred analytics service.
They can use EMR for Apache Spark (in beta), Redshift, or Athena on diverse data sets now housed in a single data lake. Users can also combine these services without having to move data between silos.
How it works
Lake Formation helps to build, secure, and manage your data lake. First, identify existing data stores in S3 or relational and NoSQL databases, and move the data into your data lake. Then crawl, catalog, and prepare the data for analytics. Then provide your users secure self-service access to the data through their choice of analytics services. Other AWS services and third-party applications can also access data through the services shown. Lake Formation manages all of the tasks in the orange box and is integrated with the data stores and services shown in the blue boxes.
Panasonic Avionics Corporation is the world's leading supplier of in-flight entertainment and communication systems.
“We wanted to create a data platform with the ability to manage the security settings for all the different applications in our environment. With AWS Lake Formation, we can now define policies once and enforce them in the same way, everywhere, for multiple services we use, including AWS Glue and Amazon Athena,” said Anand Desikan, Director of Cloud and Data Services at Panasonic Avionics. “The enhanced level of control gives us secure access to data and meta-data for columns and tables, not just for bulk objects, which is an important part of our data security and governance standard.”
Accenture is a leading global professional services company, providing a broad range of services and solutions in strategy, consulting, digital, technology, and operations.
“I focus on helping clients in their ‘Data on Cloud’ journey. Specific to that, we have seen that organizations are dealing with a lack of trusted data when they need to perform analytics on data coming from multiple sources,” said Namrata Maheshwary, Senior Architect for the Data Business Group, Accenture. “Data cleansing is a critical step in data analytics and can greatly impact the business outcome and decision making. The new features in AWS Lake Formation have been hugely beneficial to address the challenge of data veracity and securing access to the data lake. We found it tremendously useful to make use of the advanced machine learning techniques for data preparation to find matching records, clean, and deduplicate data from different data sources. This will help reduce the time, effort, and cost, while improving the quality and accuracy of the data in a customer’s data lakes.”
Zalando is Europe’s leading online platform for fashion and lifestyle.
“As Europe’s most fashionable tech company, we work hard to find digital solutions for every aspect of the fashion journey,” said Alberto Miorin, Engineering Lead, Zalando SE. “AWS Lake Formation gave us a scalable central point of control for data access through Amazon Redshift that not only simplified the process, but improved it through granular control over how our data is being used. Now we can discover, access, and analyze data in our data lake with our preferred tools, and leverage it for business intelligence and data science. This streamlined workflow helps our executives make the right decisions on time, and fosters innovation through machine learning.”
Life360 is the world's leading peace of mind service for families. The Life360 app brings families closer with smart features designed to protect and connect the people who matter most.
“We wanted to use AWS Lake Formation to build our data lake for supporting location-based time-series data, and make it much easier to load data. The pre-fabricated blueprints helped get data into the data lake without our data engineering team having to write code from scratch, so they could focus on operationalizing ingest, not reinventing the wheel,” said Richard Chennault, Head of Cloud and Data Services, Life360, Inc. “With AWS Lake Formation we were able to quickly unlock data available in Amazon S3 and make it available to analyze across a broad spectrum of AWS data services. The data remains in place in Amazon S3, we can analyze it in many different ways, and we maintain full control over it.”
Change Healthcare is a leading independent healthcare technology company that provides data and analytics-driven solutions that reach approximately 2,100 government and commercial payer connections, 5,500 hospitals, 900,000 physicians, and 33,000 pharmacies.
“We handle data from millions of transactions daily while maintaining compliance with healthcare industry regulations, including HIPAA,” said Aaron Symanski, CTO of Change Healthcare. “We are very excited about the launch of AWS Lake Formation, which provides a central point of control to easily load, clean, secure, and catalog data from thousands of clients to our AWS-based data lake, dramatically reducing our operational load. The data access controls in Lake Formation will make it easy for us define our policies once and have them be enforced across all the analytics and machine learning services we use, with audit logs to show compliance.”
Fender Digital is a part of Fender, the iconic guitar brand, that makes apps, websites, platforms and tools to complement the guitars, amps and audio gear that Fender makes.
“We are generating tons of user and usage data from our digital applications and devices. We are planning to build a data lake on AWS to operate alongside our Amazon Redshift based data warehouse” said Joshua Couch, VP Engineering at Fender Digital. “I can’t wait for my team to get our hands on AWS Lake Formation. Lake Formation will make it easy for us to load, transform, and catalog our data and make it securely available within our organization, across a wide portfolio of AWS services. With an enterprise-ready option like Lake Formation, we will be able to spend more time deriving value from our data rather than doing the heavy lifting involved in manually setting up and managing our data lake.”
Supercharged by migration and management software platform, Cloudamize, Cloudreach brings simplicity and absolute confidence to data-driven decision making.
“AWS Lake Formation is democratizing the data lake and creating a point of acceleration for enterprise data strategy,” said Kevin Davis, CTO AWS Practice, Cloudreach. “AWS Lake Formation centralizes security and governance of services, streamlining management and reducing operational overhead. By accelerating the process of de-siloing data across the enterprise, other data initiatives, such as machine learning, start to drive greater business value.”
Amgen is the world's largest independent biotechnology company.
“At Amgen we've been heavy users of Amazon Redshift and Amazon EMR clusters for over three years. Setting up security and access controls for each AWS account, service, user, and data set at the level of detail that was required could be cumbersome,” said Kerby Johnson, Enterprise Data Lake Product Owner, Amgen. “AWS Lake Formation streamlines the process with a central point of control while also enabling us to manage who is using our data, and how, with more detail. AWS Lake Formation allows us to manage permissions on Amazon S3 objects like we would manage permissions on data in a database. Our users will be able to find, access, and analyze the data they need with the tools they prefer. This new workflow can make everyone more productive when using Amgen’s data.”
Alcon is a leader in innovation and development of life-changing vision and eye care products.
“Like a lot of companies, we started our data lake initiative to get away from having inaccessible silos of data,” said Srinivas Ravilisetty, IT Analytics Lead, Alcon. ”With AWS Lake Formation we can quickly add access to existing Amazon S3 buckets and define what's in them and how it can be used. The data remains in place in S3, but we have full control over it for other uses.”
Quantiphi is an Artificial Intelligence and Big Data software and services company driven by the desire to solve complex business problems. Quantiphi specializes in building data lakes and AI solutions for customers to deliver quantifiable value.
“AWS Lake Formation allows us to deliver a secure data lake with access to relevant data in days,” said Arnav Gupta, AWS Practice Lead, Quantiphi. “We now have the ability to deliver the best of both worlds for our customers – full security, plus simplified access to relevant data for their users to make decisions easily. Our customers can focus on making smarter, analysis-driven business decisions by tapping into a powerful, centralized data source.”
Curvo is a Software-as-a-Service company focused exclusively on the healthcare supply chain. With deep domain expertise and agile development practices, they build the analytics, the workflow, and the automation to make spend management in healthcare faster and easier.
“Data normalization is a critical step in providing better patient outcomes by bringing transparency into benchmark pricing data for clinical and medical products. Using ML Transformations in AWS Lake Formation, we now process data sets in four hours, down from one week, and our degree of accuracy improved to near 100%," said Nic Sagez, CTO. “This speed and accuracy allows our healthcare customers to quickly respond to market changes, ultimately delivering more affordable care without sacrificing patient outcomes. We deliver to them in one day what takes our competitors 4-6 weeks.”