Amazon Redshift Spectrum enables you to run Amazon Redshift SQL queries against exabytes of data in Amazon S3. With Redshift Spectrum, you can extend the analytic power of Amazon Redshift beyond data stored on local disks in your data warehouse to query vast amounts of unstructured data in your Amazon S3 “data lake” -- without having to load or transform any data. Redshift Spectrum applies sophisticated query optimization, scaling processing across thousands of nodes so results are fast – even with large data sets and complex queries.
Redshift Spectrum directly queries data in Amazon S3 using the open data formats you already use, including CSV, TSV, Parquet, Sequence, and RCFile. Since Redshift Spectrum supports the same SQL syntax of Amazon Redshift, you can run sophisticated queries using the same Business Intelligence (BI) tools you use today. You can also run queries that span both the frequently accessed data stored locally in Amazon Redshift and your full data sets stored cost-effectively in Amazon S3.
Redshift Spectrum gives you the freedom to store your data where you want, in the format you want, and have it available for processing when you need it. There are no up-front payments or commitments with Redshift Spectrum; you only pay for the queries you run.
With Amazon Redshift Spectrum, you can start querying your data in Amazon S3 immediately, with no loading or transformation required. You just need to register your Amazon Athena data catalog or Hive Metastore as an external schema. You can use the same SQL you use for querying Amazon Redshift tables and any BI tool that supports Redshift today.
Amazon Redshift delivers super-fast performance whether it is for ad-hoc analysis on large unstructured data sets in Amazon S3 or frequent analysis on structured data sets in Redshift tables. You can maintain hot data in your Amazon Redshift clusters to get the performance of local disks, and use Amazon Redshift Spectrum to extend your queries to cold data stored in Amazon S3 for unlimited scalability and low cost. The Amazon Redshift query optimizer will automatically determine how to minimize data scanned in Amazon S3 and the number of Redshift Spectrum nodes to use in the query.
With Amazon Redshift Spectrum, you don’t have to worry about scaling your cluster. It lets you separate storage and compute, allowing you to scale each independently. You can even run multiple Amazon Redshift clusters against the same Amazon S3 data lake, enabling limitless concurrency. Redshift Spectrum automatically scales out to thousands of instances if needed, so queries run quickly, whether processing a terabyte, a petabyte or an exabyte.
With Amazon Redshift Spectrum, you only pay for the queries you run. You are charged $5 per terabyte of data processed to execute your query. Redshift Spectrum can query compressed data. You can both save 30% to 90% on your per-query costs and improve performance by compressing, partitioning, and converting your data to a columnar format. There are no charges for Redshift Spectrum when you’re not running queries. You pay standard Amazon S3 rates for data storage and Amazon Redshift instance rates for the clusters used.