There is no additional charge for using features in Lake Formation. Lake Formation helps you build and manage data lakes where your data in stored in Amazon S3. It builds on capabilities available in AWS Glue and uses the Glue Data Catalog, jobs, and crawlers. It also integrates with services like Amazon Cloudtrail, AWS IAM, Amazon CloudWatch, Amazon Athena, Amazon EMR, and Amazon Redshift, and others.

Standard usage rates for these services will apply based on pricing for these services.

ETL job example: Consider a job of type Apache Spark that runs for 10 minutes and consumes 6 DPUs. The price of 1 DPU-Hour is $0.44. Since your job ran for 1/6th of an hour and consumed 6 DPUs, you will be billed 6 DPUs * 1/6 hour at $0.44 per DPU-Hour or $0.44.

Data catalog free tier example: Let’s consider that you store a million tables in your data catalog in a given month and make a million requests to access these tables. You pay $0 because your usage will be covered under the data catalog free tier. You can store the first million objects and make a million requests per month for free.

Data catalog example: Now consider your storage usage remains the same at one million tables per month, but your requests double to two million requests per month. Let’s say you also use crawlers to find new tables and they run for 30 minutes and consume 2 DPUs.

Your storage cost is still $0, as the storage for your first million tables is free. Your first million requests are also free. You will be billed for one million requests above the free tier, which is $1. Crawlers are billed at $0.44 per DPU-Hour, so you will pay for 2 DPUs * 1/2 hour at $0.44 per DPU-Hour or $0.44. This is a total monthly bill of $1.44.

ML Transforms example: Similar to AWS Glue jobs runs, the cost of running ML Transforms, including FindMatches on your data will vary based on the size of your data, the content of your data, and the number and types of nodes that you use. In the following example, we used FindMatches to integrate points of interest information from multiple data sources. With a data set size of ~11,000,000 rows (1.6GB), a size of Label data (examples of true matches or true no-matches) of ~8,000 rows (641kb), running on 16 instances of type G.2x, then you would have a labelset generation runtime of 34 minutes at a cost of $8.23, a metrics estimation runtime of 11 minutes at a cost of $2.66, and a FindingMatches job execution runtime of 32 minutes at a cost of $7.75.

