The amount of data generated by IoT, smart devices, cloud applications, and social is growing exponentially. You need ways to easily and cost-effectively analyze all of this data with minimal time-to-insight, regardless of the format or where the data is stored.
Amazon Redshift powers the lake house architecture – enabling you to query data across your data warehouse, data lake, and operational databases to gain faster and deeper insights not possible otherwise. With a lake house architecture, you can store data in open file formats in your Amazon S3 data lake. This allows you to make this data available easily to other analytics and machine learning tools rather than locking it in a new silo.
With an Amazon Redshift lake house architecture, you can:
- Easily query data in your data lake and write data back to your data lake in open formats.
- Use familiar SQL statements to combine and process data across all your data stores.
- Execute queries on live data in your operational databases without requiring any data loading and ETL pipelines.
Amazon Redshift lake house architecture is powered by the following capabilities:
Amazon Redshift Spectrum
Query open format data directly in the Amazon S3 data lake without having to load the data or duplicating your infrastructure. Using the Amazon Redshift Spectrum feature, you can query open file formats such as Apache Parquet, ORC, JSON, Avro, and CSV. Follow this step-by-step tutorial to get started.
Data Lake Export
Save the results of an Amazon Redshift query directly to your S3 data lake in an open file format (Apache Parquet) using Data Lake Export. You can then analyze this data using Amazon Redshift Spectrum feature as well as other AWS services such as Sagemaker for machine learning, and EMR for ETL operations. Watch this 5-minute video to get started.
Federated Query enables Amazon Redshift to query data directly in Amazon RDS and Aurora PostgreSQL stores. This allows you to incorporate timely and up-to-date operational data in your reporting and BI applications, without any ETL operations. Watch this 5-minute video or read this tutorial to get started.
“We utilize many AWS and third-party analytics tools, and we are pleased to see Amazon Redshift continue to embrace the same varied data transform patterns that we already do with our own solution. We’ve harnessed Amazon Redshift’s ability to query open data formats across our data lake with Redshift Spectrum since 2017, and now with the new Redshift Data Lake Export feature, we can conveniently write data back to our data lake. This all happens with consistently fast performance, even at our highest query loads. We look forward to leveraging the synergy of an integrated big data stack to drive more data sharing across Amazon Redshift clusters, and derive more value at a lower cost for all our games.”
Kurt Larson, Technical Director of Analytics Marketing Operations - Warner Bros. Analytics
ETL and ELT design patterns for lake house architecture using Amazon Redshift: Part 1
Build scalable ETL and ELT design patterns for lake house architecture using Amazon Redshift: Part 1.
Getting started with Amazon Redshift Spectrum
Step-by-step tutorial to get started on Amazon Redshift Spectrum.
How to scale data analytics with Amazon Redshift
Learn how Warner Bros, an entertainment company, uses Amazon Redshift to scale its data analytics workloads.
Sign up for an AWS account and get instant access to the AWS Free Tier.
Gain free, hands-on experience with the AWS platform, products, and services.
Migrate your databases quickly & securely with AWS Database Migration Service.