Amazon Redshift extends data warehouse queries to your data lake, with no loading required. You can run analytic queries against petabytes of data stored locally in Redshift, and directly against exabytes of data stored in Amazon S3. It is simple to set up, automates most of your administrative tasks, and delivers fast performance at any scale.
Massively parallel: Amazon Redshift delivers fast query performance on datasets ranging in size from gigabytes to exabytes. Redshift uses columnar storage, data compression, and zone maps to reduce the amount of I/O needed to perform queries. It uses a massively parallel processing (MPP) data warehouse architecture to parallelize and distribute SQL operations to take advantage of all available resources. The underlying hardware is designed for high performance data processing, using local attached storage to maximize throughput between the CPUs and drives, and a high bandwidth mesh network to maximize throughput between nodes.
Machine learning: Amazon Redshift uses machine learning to deliver high throughout, irrespective of your workloads or concurrent usage. Redshift utilizes sophisticated algorithms to predict incoming query run times, and assigns them to the optimal queue for the fastest processing. For example, queries such as dashboards and reports with high concurrency requirements are routed to an express queue for immediate processing. As concurrency increases further, Amazon Redshift predicts when queuing may begin and automatically deploys transient resources with the Concurrency Scaling feature to ensure consistently fast performance, irrespective of variability in demand on the cluster.
Result caching: Amazon Redshift uses result caching to deliver sub-second response times for repeat queries. Dashboard, visualization, and business intelligence tools that execute repeat queries experience a significant performance boost. When a query executes, Redshift searches the cache to see if there is a cached result from a prior run. If a cached result is found and the data has not changed, the cached result is returned immediately instead of re-running the query.
Easy to setup, deploy, and manage
Automated provisioning: Amazon Redshift is simple to set up and operate. You can deploy a new data warehouse with just a few clicks in the AWS console, and Redshift automatically provisions the infrastructure for you. Most administrative tasks are automated, such as backups and replication, so you can focus on your data, not the administration. When you want control, Redshift provides options to help you make adjustments tuned to your specific workloads. New capabilities are released transparently, eliminating the need to schedule and apply upgrades and patches.
Automated backups: Amazon Redshift automatically and periodically backs up your data to Amazon S3. Redshift can asynchronously replicate your snapshots to S3 in another region for disaster recovery. You can use any system or user snapshot to restore your cluster using the AWS Management Console or the Redshift APIs. Your cluster is available as soon as the system metadata has been restored, and you can start running queries while user data is spooled down in the background.
Fault tolerant: Amazon Redshift has multiple features that enhance the reliability of your data warehouse cluster. Redshift continuously monitors the health of the cluster, and automatically re-replicates data from failed drives and replaces nodes as necessary for fault tolerance.
Flexible querying: Amazon Redshift gives you the flexibility to execute queries within the console or connect SQL client tools, libraries, or Business Intelligence tools you love. Query Editor on the AWS console provides a powerful interface for executing SQL queries on Redshift clusters and viewing the query results and query execution plan (for queries executed on compute nodes) adjacent to your queries.
Integrated with third-party tools: Enhance Amazon Redshift by working with industry-leading tools and experts for loading, transforming and visualizing data. Our extensive list of Partners have certified their solutions to work with Amazon Redshift.
- Load and transform your data with Data Integration Partners »
- Analyze data and share insights across your organization with Business Intelligence Partners »
- Architect and implement your analytics platform with System Integration and Consulting Partners »
- Query, explore and model your data using tools and utilities from Query and Data Modeling Partners »
No upfront costs, pay as you go: Amazon Redshift is the most cost-effective data warehouse, and you pay only for the resources you provision. You can start small for just $0.25 per hour with no commitments, and scale out for just $250 per terabyte per year. Redshift is the only cloud data warehouse that offers On-Demand pricing with no up-front costs, Reserved Instance pricing which can save you up to 75% by committing to a 1- or 3-year term, and per query pricing based on the amount of data scanned in your Amazon S3 data lake. For more information, see the Amazon Redshift Pricing page.
Predictable cost, even with unpredictable workloads: Amazon Redshift allows customers to scale with minimal cost-impact, as each cluster earns up to one hour of free Concurrency Scaling credits per day. These free credits are sufficient for the concurrency needs of most customers. This provides customers with predictability in their month-to-month cost, even during periods of fluctuating analytical demand.
Choose your node type: You can select from two node types to optimize Redshift for your data warehousing needs. Dense Compute (DC) nodes allow you to create very high performance data warehouses using fast CPUs, large amounts of RAM, and solid-state disks (SSDs). If you want to scale further or reduce costs, you can switch to our more cost-effective Dense Storage (DS) node types that use larger hard disk drives for a very low price point. Scaling your cluster or switching between node types requires a single API call or a few clicks in the AWS Console.
Scale quickly to meet your needs
Petabyte-scale data warehousing: Amazon Redshift is simple and quickly scales as your needs change. With a few clicks in the console or a simple API call, you can easily change the number or type of nodes in your data warehouse, and scale up or down as your needs change.
Exabyte-scale data lake analytics: Redshift Spectrum, a feature of Redshift, enables you to run queries against exabytes of data in Amazon S3 without having to load or transform any data. You can use S3 as a highly available, secure, and cost-effective data lake to store unlimited data in open data formats.
Limitless concurrency: Amazon Redshift provides consistently fast performance, even with thousands of concurrent queries - whether they query data in your Amazon Redshift data warehouse, or directly in your Amazon S3 data lake. With Concurrency Scaling, Amazon Redshift automatically adds transient capacity as concurrency increases. With Spectrum, Amazon Redshift provides limitless concurrency by enabling multiple queries to access the same data simultaneously in Amazon S3. Redshift Spectrum executes queries across thousands of parallelized nodes to deliver fast results, regardless of the complexity of the query or the amount of data.
Query your data lake
Amazon S3 data lake: Amazon Redshift is the only data warehouse that extends your queries to your Amazon S3 data lake without loading data. You can query open file formats you already use, such as Avro, CSV, Grok, JSON, ORC, Parquet, and more, directly in S3. This gives you the flexibility to store highly structured, frequently accessed data on Redshift local disks, keep exabytes of structured and unstructured data in S3, and query seamlessly across both to provide unique insights that you would not be able to obtain by querying independent datasets.
AWS analytics ecosystem: Amazon Redshift is natively integrated with the AWS analytics ecosystem. AWS Glue can extract, transform, and load (ETL) data into Redshift. Amazon Kinesis Data Firehose is the easiest way to capture, transform, and load streaming data into Redshift for near real-time analytics. You can use Amazon QuickSight to create reports, visualizations, and dashboards. To accelerate your migration to Amazon Redshift, you can use the AWS Database Migration Service (DMS) free for six months. Learn more »
End-to-end encryption: With just a couple of parameter settings, you can set up Amazon Redshift to use SSL to secure data in transit, and hardware-accelerated AES-256 encryption for data at rest. If you choose to enable encryption of data at rest, all data written to disk will be encrypted as well as any backups. By default, Amazon Redshift takes care of key management.
Network isolation: Amazon Redshift enables you to configure firewall rules to control network access to your data warehouse cluster. You can run Amazon Redshift inside Amazon VPC to isolate your data warehouse cluster in your own virtual network and connect it to your existing IT infrastructure using industry-standard encrypted IPsec VPN.
Audit and compliance: Amazon Redshift integrates with AWS CloudTrail to enable you to audit all Redshift API calls. Redshift logs all SQL operations, including connection attempts, queries, and changes to your database. You can access these logs using SQL queries against system tables, or choose to download the logs to a secure location on Amazon S3. Amazon Redshift is compliant with SOC1, SOC2, SOC3 and PCI DSS Level 1 requirements. For more details, please visit AWS Cloud Compliance.