Amazon Redshift is the fastest and most widely used cloud data warehouse. Redshift is integrated with your data lake and offers up to 3x better price performance than any other data warehouse.
Features and benefits
Each year we release hundreds of features and product improvements, driven by customer use cases and feedback. Find out more about what’s new.
Deepest integration with your data lake and AWS services
Amazon Redshift lets you quickly and simply work with your data in open formats, and easily integrates with and connects to the AWS ecosystem.
Query and export data to and from your data lake: No other cloud data warehouse makes it as easy to both query data and write data back to your data lake in open formats. You can query open file formats such as Parquet, ORC, JSON, Avro, CSV, and more directly in S3 using familiar ANSI SQL. To export data to your data lake you simply use the Redshift UNLOAD command in your SQL code and specify Parquet as the file format and Redshift automatically takes care of data formatting and data movement into S3. This gives you the flexibility to store highly structured, frequently accessed data in a Redshift data warehouse, while also keeping up to exabytes of structured, semi-structured, and unstructured data in S3. Exporting data from Redshift back to your data lake enables you to analyze the data further with AWS services like Amazon Athena, Amazon EMR, and Amazon SageMaker.
Federated Query: With the new federated query capability in Redshift, you can reach into your operational, relational database. Query live data across one or more Amazon RDS and Aurora PostgreSQL and in preview RDS MySQL and Aurora MySQL databases to get instant visibility into the end-to-end business operations without requiring data movement. You can join data from your Redshift data warehouse, data in your data lake, and now data in your operational stores to make better data-driven decisions. Redshift offers sophisticated optimizations to reduce data moved over the network and complements it with its massively parallel data processing for high-performance queries. Learn more.
Redshift ML (preview): Redshift ML is a new capability for Amazon Redshift that make it easy for data analysts and database developers to create, train, and deploy Amazon SageMaker models using SQL. With Amazon Redshift ML, customers can use SQL statements to create and train Amazon SageMaker models on their data in Amazon Redshift and then use those models for predictions such as churn detection and risk scoring directly in their queries and reports. Visit the Redshift documentation to learn how to get started. Learn more.
AWS analytics ecosystem: Native integration with the AWS analytics ecosystem makes it easier to handle end-to-end analytics workflows without friction. For example, AWS Lake Formation is a service that makes it easy to set up a secure data lake in days. AWS Glue can extract, transform, and load (ETL) data into Redshift. Amazon Kinesis Data Firehose is the easiest way to capture, transform, and load streaming data into Redshift for near real-time analytics. You can use Amazon EMR to process data using Hadoop/Spark and load the output into Amazon Redshift for BI and analytics. Amazon QuickSight is the first BI service with pay-per-session pricing that you can use to create reports, visualizations, and dashboards on Redshift data. You can use Redshift to prepare your data to run machine learning workloads with Amazon SageMaker. To accelerate migrations to Amazon Redshift, you can use the AWS Schema Conversion tool and the AWS Database Migration Service (DMS). Amazon Redshift is also deeply integrated with Amazon Key Management Service (KMS) and Amazon CloudWatch for security, monitoring, and compliance. You can also use Lambda UDFs to invoke a Lambda function from your SQL queries as if you are invoking a User Defined Function in Redshift. You can write Lambda UDFs to integrate with AWS partner services and to access other popular AWS services such as Amazon DynamoDB or Amazon SageMaker.
Redshift partner console integration (preview): You can accelerate data onboarding and create valuable business insights in minutes by integrating with select partner solutions in the Redshift console. With these solutions you can bring data from applications like Salesforce, Google Analytics, Facebook Ads, Slack, Jira, Splunk, and Marketo into your Amazon Redshift data warehouse in an efficient and streamlined way. It also enables you to join these disparate datasets and analyze them together to produce actionable insights.
Amazon Redshift offers fast, industry-leading performance with flexibility.
RA3 instances: RA3 instances deliver up to 3x better price performance of any cloud data warehouse service. These Amazon Redshift instances maximize speed for performance-intensive workloads that require large amounts of compute capacity, with the flexibility to pay separately for compute independently of storage by specifying the number of instances you need. Learn more.
AQUA (Advanced Query Accelerator) for Amazon Redshift: AQUA is a new distributed and hardware-accelerated cache that enables Redshift to run up to 10x faster than other enterprise cloud data warehouses by automatically boosting certain types of queries. AQUA uses high speed solid state storage, field-programmable gate arrays (FPGAs) and AWS Nitro to speed queries that scan, filter, and aggregate large data sets. AQUA is included with the Redshift RA3 instance type at no additional cost. Learn more.
Efficient storage and high performance query processing: Amazon Redshift delivers fast query performance on datasets ranging in size from gigabytes to petabytes. Columnar storage, data compression, and zone maps reduce the amount of I/O needed to perform queries. Along with the industry standard encodings such as LZO and Zstandard, Amazon Redshift also offers purpose-built compression encoding, AZ64, for numeric and date/time types to provide both storage savings and optimized query performance.
Materialized views: Amazon Redshift materialized views allow you to achieve significantly faster query performance for iterative or predictable analytical workloads such as dashboarding, and queries from Business Intelligence (BI) tools, and Extract, Load, Transform (ELT) data processing jobs. You can use materialized views to easily store and manage pre-computed results of a SELECT statement that may reference one or more tables, including external tables. Subsequent queries referencing the materialized views can run much faster by reusing the pre-computed results. Amazon Redshift can efficiently maintain the materialized views incrementally to continue to provide the low latency performance benefits. Learn more.
Machine learning to maximize throughput and performance: Advanced machine learning capabilities in Amazon Redshift deliver high throughput and performance, even with varying workloads or concurrent user activity. Amazon Redshift utilizes sophisticated algorithms to predict and classify incoming queries based on their run times and resource requirements to dynamically manage performance and concurrency while also helping you to prioritize your business critical workloads. Short query acceleration (SQA) sends short queries from applications such as dashboards to an express queue for immediate processing rather than being starved behind large queries. Automatic workload management (WLM) uses machine learning to dynamically manage memory and concurrency, helping maximize query throughput. In addition, you can now easily set the priority of your most important queries, even when hundreds of queries are being submitted. Amazon Redshift is also a self-learning system that observes the user workload continuously, determining the opportunities to improve performance as the usage grows, applying optimizations seamlessly, and making recommendations via Redshift Advisor when an explicit user action is needed to further turbo charge Amazon Redshift performance.
Result caching: Amazon Redshift uses result caching to deliver sub-second response times for repeat queries. Dashboard, visualization, and business intelligence tools that execute repeat queries experience a significant performance boost. When a query executes, Amazon Redshift searches the cache to see if there is a cached result from a prior run. If a cached result is found and the data has not changed, the cached result is returned immediately instead of re-running the query.
Whether you’re scaling data, or users, Amazon Redshift is virtually unlimited.
Petabyte-scale data warehousing: Amazon Redshift is simple and quickly scales as your needs change. With a few clicks in the console or a simple API call, you can easily change the number or type of nodes in your data warehouse, and scale up or down as your needs change. With managed storage, capacity is added automatically to support workloads up to 8PB of compressed data. Learn more about managing your cluster.
Petabyte-scale data lake analytics: You can run queries against petabytes of data in Amazon S3 without having to load or transform any data with the Redshift Spectrum feature. You can use S3 as a highly available, secure, and cost-effective data lake to store unlimited data in open data formats. Amazon Redshift Spectrum executes queries across thousands of parallelized nodes to deliver fast results, regardless of the complexity of the query or the amount of data.
Limitless concurrency: Amazon Redshift provides consistently fast performance, even with thousands of concurrent queries, whether they query data in your Amazon Redshift data warehouse, or directly in your Amazon S3 data lake. Amazon Redshift Concurrency Scaling supports virtually unlimited concurrent users and concurrent queries with consistent service levels by adding transient capacity in seconds as concurrency increases.
Data sharing: Amazon Redshift data sharing enables a secure and easy way to scale by sharing live data across Redshift clusters. Data Sharing improves the agility of organizations by giving instant, granular and high-performance access to data inside any Redshift cluster without the need to copy or move it. Learn more.
Using Amazon Redshift as your cloud data warehouse gives you flexibility to pay for compute and storage separately, the ability to pause and resume your cluster, predictable costs with controls, and options to pay as you go or save up to 75% with a Reserved Instance commitment.
Flexible pricing options: Amazon Redshift is the most cost-effective data warehouse, and you have choices to optimize how you pay for your data warehouse. You can start small for just $0.25 per hour with no commitments, and scale out for just $1000 per terabyte per year. Amazon Redshift is the only cloud data warehouse that offers On-Demand pricing with no up-front costs, Reserved Instance pricing which can save you up to 75% by committing to a 1- or 3-year term, and per-query pricing based on the amount of data scanned in your Amazon S3 data lake. Amazon Redshift’s pricing includes built-in security, data compression, backup storage, and data transfer. As the size of data grows you use managed storage in the RA3 instances to store data cost-effectively at $0.024 per GB per month.
Predictable cost, even with unpredictable workloads: Amazon Redshift allows customers to scale with minimal cost-impact, as each cluster earns up to one hour of free Concurrency Scaling credits per day. These free credits are sufficient for the concurrency needs of 97% of customers. This provides you with predictability in your month-to-month cost, even during periods of fluctuating analytical demand.
Choose your node type to get the best value for your workloads: You can select from three instance types to optimize Amazon Redshift for your data warehousing needs.
RA3 nodes enable you to scale storage independently of compute. With RA3 you get a high performance data warehouse that stores data in a separate storage layer. You only need to size the data warehouse for the query performance that you need.
Dense Compute (DC) nodes allow you to create very high-performance data warehouses using fast CPUs, large amounts of RAM, and solid-state disks (SSDs) and are the best choice for less than 500GB of data.
DS2 (Dense Storage) nodes enable you to create large data warehouses using hard disk drives (HDDs) for a low price point when you purchase the 3-year Reserved Instances. Most customers who run on DS2 clusters can migrate their workloads to RA3 clusters and get up to 2x performance and more storage for the same cost as DS2.
Scaling your cluster or switching between node types requires a single API call or a few clicks in the AWS Console. Visit the pricing page for more information.
Easy to manage
Amazon Redshift automates common maintenance tasks so you can focus on your data insights, not your data warehouse.
Automated provisioning: Amazon Redshift is simple to set up and operate. You can deploy a new data warehouse with just a few clicks in the AWS console, and Amazon Redshift automatically provisions the infrastructure for you. Most administrative tasks are automated, such as backups and replication. When you want control, there are options to help you make adjustments tuned to your specific workloads. New capabilities are released transparently, eliminating the need to schedule and apply upgrades and patches.
Automated backups: Data in Amazon Redshift is automatically backed up to Amazon S3, and Amazon Redshift can asynchronously replicate your snapshots to S3 in another region for disaster recovery. You can use any system or user snapshot to restore your cluster using the AWS Management Console or the Redshift APIs. Your cluster is available as soon as the system metadata has been restored, and you can start running queries while user data is spooled down in the background.
Automated Table Design: Amazon Redshift continuously monitors user workloads and uses sophisticated algorithms to find ways to improve the physical layout of data to optimize query speeds. Automatic Table Optimization selects the best sort and distribution keys to optimize performance for the cluster’s workload. If Amazon Redshift determines that applying a key will improve cluster performance, tables will be automatically altered without requiring administrator intervention. Additional features Automatic Vacuum Delete, Automatic Table Sort, and Automatic Analyze eliminate the need for manual maintenance and tuning of Redshift clusters to get the best performance for new clusters and production workloads.
Fault tolerant: There are multiple features that enhance the reliability of your data warehouse cluster. For example, Amazon Redshift continuously monitors the health of the cluster, and automatically re-replicates data from failed drives and replaces nodes as necessary for fault tolerance. Clusters can also be relocated to alternative Availability Zones (AZ’s) without any data loss or application changes.
Flexible querying: Amazon Redshift gives you the flexibility to execute queries within the console or connect SQL client tools, libraries, or Business Intelligence tools. The Query Editor on the AWS console provides a powerful interface for executing SQL queries on Amazon Redshift clusters and viewing the query results and query execution plan (for queries executed on compute nodes) adjacent to your queries.
Simple API to interact with Amazon Redshift: Amazon Redshift enables you to painlessly access data with all types of traditional, cloud-native, and containerized, serverless web services-based applications and event-driven applications. The Amazon Redshift Data API simplifies data access, ingest, and egress from programming languages and platforms supported by the AWS SDK such as Python, Go, Java, Node.js, PHP, Ruby, and C++. The Data API eliminates the need for configuring drivers and managing database connections. Instead, you can run SQL commands to an Amazon Redshift cluster by simply calling a secured API endpoint provided by the Data API. The Data API takes care of managing database connections and buffering data. The Data API is asynchronous, so you can retrieve your results later. Your query results are stored for 24 hours.
Native support for advanced analytics: Redshift supports standard scalar data types such as NUMBER, VARCHAR, and DATETIME and provides native support for the following advanced analytics processing:
Spatial data processing: Amazon Redshift provides a polymorphic data type, GEOMETRY, which supports multiple geometric shapes such as Point, Linestring, Polygon etc. Redshift also provides spatial SQL functions to construct geometric shapes, import, export, access and process the spatial data. You can add GEOMETRY columns to Redshift tables and write SQL queries spanning across spatial and non-spatial data. This capability enables you to store, retrieve, and process spatial data and seamlessly enhance your business insights by integrating spatial data into your analytical queries. With Redshift’s ability to seamlessly query data lakes, you can also easily extend spatial processing to data lakes by integrating external tables in spatial queries. See documentation for more details.
HyperLogLog sketches: HyperLogLog is a novel algorithm that efficiently estimates the approximate number of distinct values in a data set. HLL sketch is a construct that encapsulates the information about the distinct values in the data set. You can use HLL sketches to achieve significant performance benefits for queries that compute approximate cardinality over large data sets, with an average relative error between 0.01–0.6%. Redshift provides a first class datatype HLLSKETCH and associated SQL functions to generate, persist, and combine HyperLogLog sketches. The Amazon Redshift's HyperLogLog capability uses bias correction techniques and provides high accuracy with low memory footprint. See documentation for more details.
DATE & TIME data types: Amazon Redshift provides multiple data types DATE, TIME, TIMETZ, TIMESTAMP and TIMESTAMPTZ to natively store and process data/time data. TIME and TIMESTAMP types store the time data without time zone information, whereas TIMETZ and TIMESTAMPTZ types store the time data including the timezone information. You can use various date/time SQL functions to process the date and time values in Redshift queries. See documentation for more details.
Semi-structured data processing: The Amazon Redshift SUPER data type natively stores JSON and other semi-structured data in Redshift tables, and uses the PartiQL query language to seamlessly process the semi-structured data. The SUPER data type is schemaless in nature and allows storage of nested values that may contain Redshift scalar values, nested arrays and nested structures. PartiQL is an extension of SQL and provides powerful querying capabilities such as object and array navigation, unnesting of arrays, dynamic typing, and schemaless semantics. This enables you to achieve advanced analytics that combine the classic structured SQL data with the semi-structured SUPER data with superior performance, flexibility and ease-of-use. See documentation for more details.
Integrated with third-party tools: There are many options to enhance Amazon Redshift by working with industry-leading tools and experts for loading, transforming, and visualizing data. Our extensive list of Partners have certified their solutions to work with Amazon Redshift.
- Load and transform your data with Data Integration Partners
- Analyze data and share insights across your organization with Business Intelligence Partners
- Architect and implement your analytics platform with System Integration and Consulting Partners
- Query, explore and model your data using tools and utilities from Query and Data Modeling Partners
Most secure and compliant
AWS has comprehensive security capabilities to satisfy the most demanding requirements, and Amazon Redshift provides data security out-of-the-box at no extra cost.
End-to-end encryption: With just a couple of parameter settings, you can set up Amazon Redshift to use SSL to secure data in transit, and hardware-accelerated AES-256 encryption for data at rest. If you choose to enable encryption of data at rest, all data written to disk will be encrypted as well as any backups. Amazon Redshift takes care of key management by default.
Network isolation: Amazon Redshift enables you to configure firewall rules to control network access to your data warehouse cluster. You can run Redshift inside Amazon Virtual Private Cloud (VPC) to isolate your data warehouse cluster in your own virtual network and connect it to your existing IT infrastructure using an industry-standard encrypted IPsec VPN.
Audit and compliance: Amazon Redshift integrates with AWS CloudTrail to enable you to audit all Redshift API calls. Redshift logs all SQL operations, including connection attempts, queries, and changes to your data warehouse. You can access these logs using SQL queries against system tables, or choose to save the logs to a secure location in Amazon S3. Amazon Redshift is compliant with SOC1, SOC2, SOC3, and PCI DSS Level 1 requirements. For more details, please visit AWS Cloud Compliance.
Tokenization: Amazon Lambda user-defined functions (UDFs) enable you to use an AWS Lambda function as a UDF in Amazon Redshift and invoke it from Redshift SQL queries. This functionality enables you to write custom extensions for your SQL query to achieve tighter integration with other services or third-party products. You can write Lambda UDFs to enable external tokenization, data masking, identification or de-identification of data by integrating with vendors like Protegrity, and protect or unprotect sensitive data based on a user’s permissions and groups, in query time.
Granular access controls: Granular row and column level security controls ensure users see only the data they should have access to. Amazon Redshift is integrated with AWS Lake Formation, ensuring Lake Formation’s column level access controls are also enforced for Redshift queries on the data in the data lake.