Amazon Redshift is the most popular and fastest cloud data warehouse. Redshift is integrated with your data lake, offers up to 3x faster performance than any other data warehouse, and costs up to 75% less than any other cloud data warehouse.
Features and benefits
Each year we release hundreds of features and product improvements, driven by customer use cases and feedback. Find out more about what’s new.
Deepest integration with your data lake and AWS services
Amazon Redshift lets you quickly and simply work with your data in open formats, and easily connects to the AWS ecosystem.
Query and export data to and from your data lake: No other cloud data warehouse makes it as easy to both query data and write data back to your data lake in open formats. You can query open file formats such as Parquet, ORC, JSON, Avro, CSV, and more directly in S3 using familiar ANSI SQL. To export data to your data lake you simply use the Redshift UNLOAD command in your SQL code and specify Parquet as the file format and Redshift automatically takes care of data formatting and data movement into S3. This gives you the flexibility to store highly structured, frequently accessed data in a Redshift data warehouse, while also keeping up to exabytes of structured, semi-structured, and unstructured data in S3. Exporting data from Redshift back to your data lake enables you to analyze the data further with AWS services like Amazon Athena, Amazon EMR, and Amazon SageMaker.
Federated Query (preview): With the new federated query capability in Redshift, you can reach into your operational, relational database. Query live data across one or more Amazon RDS and Aurora PostgreSQL databases to get instant visibility into the end-to-end business operations without requiring data movement. You can join data from your Redshift data warehouse, data in your data lake, and now data in your operational stores to make better data-driven decisions. Redshift offers sophisticated optimizations to reduce data moved over the network and complements it with its massively parallel data processing for high-performance queries. Try the preview today.
AWS analytics ecosystem: Native integration with the AWS analytics ecosystem makes it easier to handle end-to-end analytics workflows without friction. For example, AWS Lake Formation is a service that makes it easy to set up a secure data lake in days. AWS Glue can extract, transform, and load (ETL) data into Redshift. Amazon Kinesis Data Firehose is the easiest way to capture, transform, and load streaming data into Redshift for near real-time analytics. You can use Amazon EMR to process data using Hadoop/Spark and load the output into Amazon Redshift for BI and analytics. Amazon QuickSight is the first BI service with pay-per-session pricing that you can use to create reports, visualizations, and dashboards on Redshift data. You can use Redshift to prepare your data to run machine learning workloads with Amazon SageMaker. To accelerate migrations to Amazon Redshift, you can use the AWS Schema Conversion tool and the AWS Database Migration Service (DMS). Amazon Redshift is also deeply integrated with Amazon Key Management Service (KMS) and Amazon Cloudwatch for security, monitoring, and compliance.
Amazon Redshift offers fast, industry-leading performance with flexibility.
RA3 instances: RA3 instances deliver 3x the performance of any cloud data warehouse service. These Amazon Redshift instances maximize speed for performance-intensive workloads that require large amounts of compute capacity, with the flexibility to pay separately for compute independently of storage by specifying the number of instances you need.
Efficient storage and high performance query processing: Amazon Redshift delivers fast query performance on datasets ranging in size from gigabytes to petabytes. Columnar storage, data compression, and zone maps reduce the amount of I/O needed to perform queries. Along with the industry standard encodings such as LZO and Zstandard, Amazon Redshift also offers purpose-built compression encoding, AZ64, for numeric and date/time types to provide both storage savings and optimized query performance.
Materialized views (preview): Amazon Redshift materialized views allow you to achieve significantly faster query performance for analytical workloads such as dashboarding, queries from Business Intelligence (BI) tools, and Extract, Load, Transform (ELT) data processing jobs. You can use materialized views to cache intermediate results in order to speed up slow-running queries. Amazon Redshift can efficiently maintain the materialized views incrementally to continue to provide the low latency performance benefits. Try the preview today.
Machine learning to maximize throughput and performance: Advanced machine learning capabilities in Amazon Redshift deliver high throughput and performance, even with varying workloads or concurrent user activity. Amazon Redshift utilizes sophisticated algorithms to predict and classify incoming queries based on their run times and resource requirements to dynamically manage performance and concurrency while also helping you to prioritize your business critical workloads. Short query acceleration (SQA) sends short queries from applications such as dashboards to an express queue for immediate processing rather than being starved behind large queries. Automatic workload management (WLM) uses machine learning to dynamically manage memory and concurrency, helping maximize query throughput. In addition, you can now easily set the priority of your most important queries, even when hundreds of queries are being submitted. Amazon Redshift is also a self-learning system that observes the user workload continuously, determining the opportunities to improve performance as the usage grows, applying optimizations seamlessly, and making recommendations via Redshift Advisor when an explicit user action is needed to further turbo charge Amazon Redshift performance.
Result caching: Amazon Redshift uses result caching to deliver sub-second response times for repeat queries. Dashboard, visualization, and business intelligence tools that execute repeat queries experience a significant performance boost. When a query executes, Amazon Redshift searches the cache to see if there is a cached result from a prior run. If a cached result is found and the data has not changed, the cached result is returned immediately instead of re-running the query.
Whether you’re scaling data, or users, Amazon Redshift is virtually unlimited.
Petabyte-scale data warehousing: Amazon Redshift is simple and quickly scales as your needs change. With a few clicks in the console or a simple API call, you can easily change the number or type of nodes in your data warehouse, and scale up or down as your needs change. With managed storage, capacity is added automatically to support workloads up to 8PB of compressed data.
Petabyte-scale data lake analytics: You can run queries against petabytes of data in Amazon S3 without having to load or transform any data with the Redshift Spectrum feature. You can use S3 as a highly available, secure, and cost-effective data lake to store unlimited data in open data formats. Amazon Redshift Spectrum executes queries across thousands of parallelized nodes to deliver fast results, regardless of the complexity of the query or the amount of data.
Limitless concurrency: Amazon Redshift provides consistently fast performance, even with thousands of concurrent queries, whether they query data in your Amazon Redshift data warehouse, or directly in your Amazon S3 data lake. Amazon Redshift Concurrency Scaling supports virtually unlimited concurrent users and concurrent queries with consistent service levels by adding transient capacity in seconds as concurrency increases.
Using Amazon Redshift as your cloud data warehouse gives you flexibility to pay for compute and storage separately, predictable costs with controls, and options to pay as you go or save up to 75% with a Reserved Instance commitment.
Flexible pricing options: Amazon Redshift is the most cost-effective data warehouse, and you have choices to optimize how you pay for your data warehouse. You can start small for just $0.25 per hour with no commitments, and scale out for just $1000 per terabyte per year. Amazon Redshift is the only cloud data warehouse that offers On-Demand pricing with no up-front costs, Reserved Instance pricing which can save you up to 75% by committing to a 1- or 3-year term, and per-query pricing based on the amount of data scanned in your Amazon S3 data lake. Amazon Redshift’s pricing includes built-in security, data compression, backup storage, and data transfer. As the size of data grows you use managed storage in the RA3 instances to store data cost-effectively at $0.024 per GB per month.
Predictable cost, even with unpredictable workloads: Amazon Redshift allows customers to scale with minimal cost-impact, as each cluster earns up to one hour of free Concurrency Scaling credits per day. These free credits are sufficient for the concurrency needs of 97% of customers. This provides you with predictability in your month-to-month cost, even during periods of fluctuating analytical demand.
Choose your node type to get the best value for your workloads: You can select from three instance types to optimize Amazon Redshift for your data warehousing needs.
RA3 nodes enable you to scale storage independently of compute. With RA3 you get a high performance data warehouse that stores data in a separate storage layer. You only need to size the data warehouse for the query performance that you need.
Dense Compute (DC) nodes allow you to create very high-performance data warehouses using fast CPUs, large amounts of RAM, and solid-state disks (SSDs) and are the best choice for less than 500GB of data.
DS2 (Dense Storage) nodes enable you to create large data warehouses using hard disk drives (HDDs) for a low price point when you purchase the 3-year Reserved Instances, making this the most cost-effective node type for storage heavy workloads. Most customers who run on DS2 clusters can migrate their workloads to RA3 clusters and get us to 2x performance and more storage for the same cost as DS2.
Scaling your cluster or switching between node types requires a single API call or a few clicks in the AWS Console. Visit the pricing page for more information.
Easy to manage
Amazon Redshift automates common maintenance tasks so you can focus on your data insights, not your data warehouse.
Automated provisioning: Amazon Redshift is simple to set up and operate. You can deploy a new data warehouse with just a few clicks in the AWS console, and Amazon Redshift automatically provisions the infrastructure for you. Most administrative tasks are automated, such as backups and replication. When you want control, there are options to help you make adjustments tuned to your specific workloads. New capabilities are released transparently, eliminating the need to schedule and apply upgrades and patches.
Automated backups: Data in Amazon Redshift is automatically backed up to Amazon S3, and Amazon Redshift can asynchronously replicate your snapshots to S3 in another region for disaster recovery. You can use any system or user snapshot to restore your cluster using the AWS Management Console or the Redshift APIs. Your cluster is available as soon as the system metadata has been restored, and you can start running queries while user data is spooled down in the background.
Fault tolerant: There are multiple features that enhance the reliability of your data warehouse cluster. For example, Amazon Redshift continuously monitors the health of the cluster, and automatically re-replicates data from failed drives and replaces nodes as necessary for fault tolerance.
Flexible querying: Amazon Redshift gives you the flexibility to execute queries within the console or connect SQL client tools, libraries, or Business Intelligence tools. The Query Editor on the AWS console provides a powerful interface for executing SQL queries on Amazon Redshift clusters and viewing the query results and query execution plan (for queries executed on compute nodes) adjacent to your queries.
Native spatial data processing: Amazon Redshift supports native spatial data processing functionality. This capability enables customers to store, retrieve, and process spatial data and seamlessly enhance your business insights by integrating spatial data into your analytical queries. Amazon Redshift provides a polymorphic data type, GEOMETRY, which supports multiple geometric shapes such as Point, Linestring, Polygon etc. Redshift also provides spatial SQL functions to construct geometric shapes, import, export, access and process the spatial data. You can add GEOMETRY columns to Redshift tables and write SQL queries spanning across spatial and non-spatial data. With Redshift’s ability to seamlessly query data lakes, you can also easily extend spatial processing to data lakes by integrating external tables in spatial queries.
Integrated with third-party tools: There are many options to enhance Amazon Redshift by working with industry-leading tools and experts for loading, transforming, and visualizing data. Our extensive list of Partners have certified their solutions to work with Amazon Redshift.
- Load and transform your data with Data Integration Partners
- Analyze data and share insights across your organization with Business Intelligence Partners
- Architect and implement your analytics platform with System Integration and Consulting Partners
- Query, explore and model your data using tools and utilities from Query and Data Modeling Partners
Most secure and compliant
AWS has comprehensive security capabilities to satisfy the most demanding requirements, and Amazon Redshift provides data security out-of-the-box at no extra cost.
End-to-end encryption: With just a couple of parameter settings, you can set up Amazon Redshift to use SSL to secure data in transit, and hardware-accelerated AES-256 encryption for data at rest. If you choose to enable encryption of data at rest, all data written to disk will be encrypted as well as any backups. Amazon Redshift takes care of key management by default.
Network isolation: Amazon Redshift enables you to configure firewall rules to control network access to your data warehouse cluster. You can run Redshift inside Amazon Virtual Private Cloud (VPC) to isolate your data warehouse cluster in your own virtual network and connect it to your existing IT infrastructure using an industry-standard encrypted IPsec VPN.
Audit and compliance: Amazon Redshift integrates with AWS CloudTrail to enable you to audit all Redshift API calls. Redshift logs all SQL operations, including connection attempts, queries, and changes to your data warehouse. You can access these logs using SQL queries against system tables, or choose to save the logs to a secure location in Amazon S3. Amazon Redshift is compliant with SOC1, SOC2, SOC3, and PCI DSS Level 1 requirements. For more details, please visit AWS Cloud Compliance.
Granular access controls: Granular row and column level security controls ensure users see only the data they should have access to. Amazon Redshift is integrated with AWS Lake Formation, ensuring Lake Formation’s column level access controls are also enforced for Redshift queries on the data in the data lake.