Amazon Redshift accelerates your time to insights with fast, easy, and secure cloud data warehousing at scale.
Features and benefits
Each year we release hundreds of features and product improvements, driven by customer use cases and feedback. Find out more about what’s new.
Analyze all your data
Get integrated insights running real-time and predictive analytics on complex, scaled data across your operational databases, data lakes, data warehouses and thousands of third-party datasets.
Federated query: With the new federated query capability in Amazon Redshift, you can reach into your operational relational databases. Query live data across one or more Amazon Relational Database Service (RDS), Aurora PostgreSQL, RDS MySQL, and Aurora MySQL databases to get instant visibility into the full business operations without requiring data movement. You can join data from your Redshift data warehouses, data in your data lakes, and data in your operational stores to make better data-driven decisions. Amazon Redshift offers optimizations to reduce data movement over the network and complements it with its massively parallel data processing for high-performance queries. Learn more.
Data Sharing: Amazon Redshift data sharing allows you to extend the ease of use, performance, and cost benefits of Amazon Redshift in a single cluster to multi-cluster deployments while being able to share data. Data sharing enables instant, granular, and fast data access across Redshift clusters without the need to copy or move it. Data sharing provides live access to data so your users always see the most current and consistent information as it’s updated in the data warehouse. You can securely share live data with Redshift clusters in the same or different AWS accounts and across regions. Learn more.
AWS Data Exchange for Amazon Redshift: Query Amazon Redshift datasets from your own Redshift cluster without extracting, transforming, and loading (ETL) the data. You can subscribe to Redshift cloud data warehouse products in AWS Data Exchange. As soon as a provider makes an update, the change is visible to subscribers. If you are a data provider, access is automatically granted when a subscription starts and revoked when it ends, invoices are automatically generated when payments are due, and payments are collected through AWS. You can license access to flat files, data in Amazon Redshift, and data delivered through APIs, all with a single subscription. Learn more.
Redshift ML: Redshift ML makes it easy for data analysts, data scientists, BI professionals, and developers to create, train, and deploy Amazon SageMaker models using SQL. With Redshift ML, you can use SQL statements to create and train Amazon SageMaker models on your data in Amazon Redshift and then use those models for predictions such as churn detection, financial forecasting, personalization, and risk scoring directly in your queries and reports. Learn more.
Amazon Redshift Integration for Apache Spark: This feature makes it easy to build and run Apache Spark applications on Amazon Redshift data, enabling customers to open up the data warehouse for a broader set of analytics and machine learning solutions. With Amazon Redshift Integration for Apache Spark, developers using AWS analytics and ML services such as Amazon EMR, AWS Glue, Amazon Athena Spark, and Amazon SageMaker can get started in seconds, and effortlessly build Apache Spark applications that read from and write to their Amazon Redshift data warehouse without compromising on performance of the applications or transactional consistency of the data. Amazon Redshift Integration for Apache Spark also makes it easier to monitor and troubleshoot performance issues of Apache Spark applications when using with Amazon Redshift.
Amazon Aurora Zero-ETL to Amazon Redshift: It is a no-code integration between Amazon Aurora and Amazon Redshift that enables Amazon Aurora customers to use Amazon Redshift for near real-time analytics and machine learning on petabytes of transactional data. Within seconds of transactional data being written into Amazon Aurora, Amazon Aurora Zero-ETL to Amazon Redshift seamlessly makes the data available in Amazon Redshift, eliminating the need for customers to build and maintain complex data pipelines performing extract, transform, and load (ETL) operations. This integration reduces operational burden and cost, and enables customers to focus on improving their applications. With near real-time access to transactional data, customers can leverage Amazon Redshift’s analytics and machine learning capabilities to derive insights from transactional and other data to respond effectively to critical, time sensitive events.
Streaming Ingestion: Data engineers, data analysts, and big data developers are using real-time streaming engines to improve customer responsiveness. With the new streaming ingestion capability in Amazon Redshift, you can use SQL (Structured Query Language) to connect to and directly ingest data from Amazon Kinesis Data Streams and Amazon Managed Streaming for Apache Kafka (MSK). Amazon Redshift Streaming Ingestion also makes it easy to create and manage downstream pipelines by letting you create materialized views on top of streams directly. The materialized views can also include SQL transformations as part of your ELT (Extract Load Transform) pipeline. You can manually refresh defined materialized views to query the most recent streaming data. This approach allows you to perform downstream processing and transformations of streaming data using existing familiar tools at no additional cost.
Query and export data to and from your data lake: No other cloud data warehouse makes it as easy to both query data and write data back to your data lake in open formats. You can query open file formats such as Parquet, ORC, JSON, Avro, CSV, and more directly in Amazon S3 using familiar ANSI SQL. To export data to your data lake, simply use the Amazon Redshift UNLOAD command in your SQL code and specify Parquet as the file format, and Amazon Redshift automatically takes care of data formatting and data movement into S3. This gives you the flexibility to store highly structured, frequently accessed data and semi-structured data in an Amazon Redshift data warehouse, while keeping up to exabytes of structured, semi-structured, and unstructured data in Amazon S3. Exporting data from Amazon Redshift back to your data lake lets you analyze the data further with AWS services such as Amazon Athena, Amazon EMR, and Amazon SageMaker.
AWS services integration: Native integration with AWS services, databases, and machine learning services makes it easier to handle complete analytics workflows without friction. For example, AWS Lake Formation is a service that makes it easy to set up a secure data lake in days. AWS Glue can extract, transform, and load (ETL) data into Amazon Redshift. Amazon Kinesis Data Firehose is the easiest way to capture, transform, and load streaming data into Amazon Redshift for near real-time analytics. You can use Amazon EMR to process data using Hadoop/Spark and load the output into Amazon Redshift for BI and analytics. Amazon QuickSight is the first BI service with pay-per-session pricing that you can use to create reports, visualizations, and dashboards on Redshift data. You can use Amazon Redshift to prepare your data to run machine learning (ML) workloads with Amazon SageMaker. To accelerate migrations to Amazon Redshift, you can use the AWS Schema Conversion tool and the AWS Database Migration Service (DMS). Amazon Redshift is also deeply integrated with Amazon Key Management Service (KMS) and Amazon CloudWatch for security, monitoring, and compliance. You can also use Lambda user-defined functions (UDFs) to invoke a Lambda function from your SQL queries as if you are invoking a UDF in Amazon Redshift. You can write Lambda UDFs to integrate with AWS Partner services and to access other popular AWS services such as Amazon DynamoDB and Amazon SageMaker.
Partner console integration: You can accelerate data onboarding and create valuable business insights in minutes by integrating with select partner solutions in the Amazon Redshift console. With these solutions you can bring data from applications such as Salesforce, Google Analytics, Facebook Ads, Slack, Jira, Splunk, and Marketo into your Redshift data warehouse in an efficient and streamlined way. It also lets you join these disparate datasets and analyze them together to produce actionable insights.
Auto-copy from Amazon S3: Amazon Redshift supports auto-copy to simplify and automate data loading from Amazon S3 reducing time and effort to build custom solutions or manage 3rd party services. With this feature, Amazon Redshift eliminates the need for manually and repeatedly running copy procedures by automating file ingestion and taking care of continuous data loading steps under the hood. Support for auto-copy makes it easy for line-of-business users and data analysts without any data engineering knowledge to easily create ingestion rules and configure the location of the data they wish to load from Amazon S3. As new data lands in specified Amazon S3 folders, ingestion process is triggered automatically based on user-defined configurations. All file formats are supported by the Redshift copy command, including CSV, JSON, Parquet, and Avro.
Native support for advanced analytics: Amazon Redshift supports standard scalar data types such as NUMBER, VARCHAR, and DATETIME and provides native support for the following advanced analytics processing:
- Spatial data processing: Amazon Redshift provides a polymorphic data type, GEOMETRY, that supports multiple geometric shapes such as Point, Linestring, and Polygon. Amazon Redshift also provides spatial SQL functions to construct geometric shapes, import, export, access, and process the spatial data. You can add GEOMETRY columns to Redshift tables and write SQL queries spanning across spatial and non-spatial data. This capability lets you store, retrieve, and process spatial data and seamlessly enhance your business insights by integrating spatial data into your analytical queries. With Amazon Redshift’s ability to seamlessly query data lakes, you can also easily extend spatial processing to data lakes by integrating external tables in spatial queries. See the documentation for more details.
- HyperLogLog sketches: HyperLogLog is a novel algorithm that efficiently estimates the approximate number of distinct values in a dataset. HLL sketch is a construct that encapsulates the information about the distinct values in the dataset. You can use HLL sketches to achieve significant performance benefits for queries that compute approximate cardinality over large datasets, with an average relative error of 0.01–0.6%. Amazon Redshift provides a first-class datatype HLLSKETCH and associated SQL functions to generate, persist, and combine HyperLogLog sketches. The Amazon Redshift HyperLogLog capability uses bias correction techniques and provides high accuracy with low memory footprint. See the documentation for more details.
- DATE & TIME data types: Amazon Redshift provides multiple data types DATE, TIME, TIMETZ, TIMESTAMP, and TIMESTAMPTZ to natively store and process data/time data. TIME and TIMESTAMP types store the time data without time zone information, whereas TIMETZ and TIMESTAMPTZ types store the time data including the time zone information. You can use various date/time SQL functions to process the date and time values in Redshift queries. See the documentation for more details.
- Semi-structured data processing: The Amazon Redshift SUPER data type natively stores JSON and other semi-structured data in Redshift tables, and uses the PartiQL query language to seamlessly process the semi-structured data. The SUPER data type is schema-less in nature and allows storage of nested values that may contain Redshift scalar values, nested arrays, and nested structures. PartiQL is an extension of SQL and provides powerful querying capabilities such as object and array navigation, un-nesting of arrays, dynamic typing, and schema-less semantics. This lets you achieve advanced analytics that combine the classic structured SQL data with the semi-structured SUPER data with superior performance, flexibility, and ease of use. See the documentation for more details.
- Integration with third-party tools: There are many options to enhance Amazon Redshift by working with industry-leading tools and experts for loading, transforming, and visualizing data. Our extensive list of Partners have certified their solutions to work with Amazon Redshift.
- Load and transform your data with Data Integration Partners.
- Analyze data and share insights across your organization with Business Intelligence Partners.
- Architect and implement your analytics platform with System Integration and Consulting Partners.
- Query, explore, and model your data using tools and utilities from Query and Data Modeling Partners.
Price-performance at any scale
Gain up to 5x better price performance than other cloud data warehouses with automated optimizations to improve query speed.
RA3 instances: RA3 instances deliver up to 5x better price performance of any cloud data warehouse service. These Amazon Redshift instances maximize the speed for performance-intensive workloads that require large amounts of compute capacity, with the flexibility to pay separately for compute independently of storage by specifying the number of instances you need. Learn more.
Efficient storage and high-performance query processing: Amazon Redshift delivers fast query performance on datasets ranging in size from gigabytes to petabytes. Columnar storage, data compression, and zone maps reduce the amount of I/O needed to perform queries. Along with the industry-standard encodings such as LZO and Zstandard, Amazon Redshift also offers purpose-built compression encoding, AZ64, for numeric and date/time types to provide both storage savings and optimized query performance.
Limitless concurrency: Amazon Redshift provides consistently fast performance, even with thousands of concurrent queries, whether they query data in your Redshift data warehouse or directly in your Amazon S3 data lake. Amazon Redshift Concurrency Scaling supports virtually unlimited concurrent users and concurrent queries with consistent service levels by adding transient capacity in seconds as concurrency increases. Learn more.
Materialized views: Amazon Redshift materialized views allow you to achieve significantly faster query performance for iterative or predictable analytical workloads such as dashboarding and queries from Business Intelligence (BI) tools, and extract, transform and load ELT data processing jobs. You can use materialized views to easily store and manage precomputed results of a SELECT statement that may reference one or more tables, including external tables. Subsequent queries referencing the materialized views can run much faster by reusing the precomputed results. Amazon Redshift can efficiently maintain the materialized views incrementally to continue to provide the low latency performance benefits. Learn more.
Automated Materialized Views: Organizations are building more data dependent applications, dashboards, reports and ad-hoc queries than ever before. Each application needs to be tuned and optimized, which requires time, resources, and money. Materialized views are a powerful tool for improving query performance, and you could set these up if you have well understood workloads. However, you might have increased and changing workloads where query patterns are not predictable. Automated Materialized Views improve throughput of queries, lower query latency, shorten execution time through automatic refresh, auto query rewrite, incremental refresh, and continuous monitoring of Amazon Redshift clusters. Amazon Redshift balances the creation and management of AutoMVs with minimal resource utilization. Learn more.
Machine learning to maximize throughput and performance: Advanced ML capabilities in Amazon Redshift deliver high throughput and performance, even with varying workloads or concurrent user activity. Amazon Redshift uses sophisticated algorithms to predict and classify incoming queries based on their run times and resource requirements to dynamically manage performance and concurrency while also helping you prioritize your business-critical workloads. Short query acceleration (SQA) sends short queries from applications such as dashboards to an express queue for immediate processing rather than being starved behind large queries. Automatic workload management (WLM) uses ML to dynamically manage memory and concurrency, helping maximize query throughput. In addition, you can now easily set the priority of your most important queries, even when hundreds of queries are being submitted. Amazon Redshift is also a self-learning system that observes the user workload, determining the opportunities to improve performance as the usage grows, applying optimizations seamlessly, and making recommendations through Redshift Advisor when an explicit user action is needed to further turbocharge Redshift performance.
Result caching: Amazon Redshift uses result caching to deliver sub-second response times for repeat queries. Dashboard, visualization, and business intelligence tools that run repeat queries experience a significant performance boost. When a query runs, Amazon Redshift searches the cache to see if there is a cached result from a prior run. If a cached result is found and the data has not changed, the cached result is returned immediately instead of re-running the query.
Petabyte-scale data warehousing: With a few clicks in the console or a simple API call, you can easily change the number or type of nodes in your data warehouse, and scale up or down as your needs change. With managed storage, capacity is added automatically to support workloads up to 8 PB of compressed data. You can also run queries against petabytes of data in Amazon S3 without having to load or transform any data with the Amazon Redshift Spectrum feature. You can use S3 as a highly available, secure, and cost-effective data lake to store unlimited data in open data formats. Redshift Spectrum runs queries across thousands of parallelized nodes to deliver fast results, regardless of the complexity of the query or the amount of data.
Flexible pricing options: Amazon Redshift is the most cost-effective data warehouse, and you can optimize how you pay. You can start small for just $0.25 per hour with no commitments, and scale out for just $1,000 per terabyte per year. Amazon Redshift is the only cloud data warehouse that offers on-demand pricing with no upfront costs, Reserved Instance pricing that can save you up to 75% by committing to a one- or three-year term, and per-query pricing based on the amount of data scanned in your Amazon S3 data lake. Amazon Redshift’s pricing includes built-in security, data compression, backup storage, and data transfer. As the size of data grows, you use managed storage in the RA3 instances to store data cost-effectively at $0.024 per GB per month.
Predictable cost, even with unpredictable workloads: Amazon Redshift allows you to scale with minimal cost impact, as each cluster earns up to one hour of free Concurrency Scaling credits per day. These free credits are sufficient for the concurrency needs of 97% of customers. This provides you with predictability in your month-to-month cost, even during periods of fluctuating analytical demand.
Choose your node type to get the best value for your workloads: You can select from three instance types to optimize Amazon Redshift for your data warehousing needs: RA3 nodes, Dense Compute nodes, and Dense Storage nodes.
RA3 nodes let you scale storage independently of compute. With RA3, you get a high-performance data warehouse that stores data in a separate storage layer. You only need to size the data warehouse for the query performance that you need.
Dense Compute (DC) nodes allow you to create very high-performance data warehouses using fast CPUs, large amounts of RAM, and solid-state disks (SSDs) and are the best choice for less than 500 GB of data.
Dense Storage (DS2) nodes let you create large data warehouses using hard disk drives (HDDs) for a low price point when you purchase the three-year Reserved Instances. Most customers who run on DS2 clusters can migrate their workloads to RA3 clusters and get up to twice the performance and more storage for the same cost as DS2.
Scaling your cluster or switching between node types requires a single API call or a few clicks in the AWS Management Console. Visit the pricing page for more information.
Easy, secure, and reliable
Focus on getting from data to insights in seconds and delivering on your business outcomes, without worrying about managing your data warehouse.
Amazon Redshift Serverless: Amazon Redshift Serverless is a serverless option of Amazon Redshift that makes it easy to run analytics in seconds and scale without the need to set up and manage data warehouse infrastructure. With Amazon Redshift Serverless, any user—including data analysts, developers, business professionals, and data scientists—can get insights from data by simply loading and querying data in the data warehouse. Learn more.
Query Editor v2: Use SQL to make your Amazon Redshift data and data lake more accessible to data analysts, data engineers, and other SQL users with a web-based analyst workbench for data exploration and analysis. Query Editor v2 lets you visualize query results in a single click, create schemas and tables, load data visually, and browse database objects. It also provides an intuitive editor for authoring and sharing SQL queries, analyses, visualizations, and annotations, and securely sharing them with your team.
Automated Table Design: Amazon Redshift monitors user workloads and uses sophisticated algorithms to find ways to improve the physical layout of data to optimize query speeds. Automatic Table Optimization selects the best sort and distribution keys to optimize performance for the cluster’s workload. If Amazon Redshift determines that applying a key will improve cluster performance, tables will be automatically altered without requiring administrator intervention. The additional features Automatic Vacuum Delete, Automatic Table Sort, and Automatic Analyze eliminate the need for manual maintenance and tuning of Redshift clusters to get the best performance for new clusters and production workloads.
Query using your own tools: Amazon Redshift gives you the flexibility to run queries within the console or connect SQL client tools, libraries, or data science tools including Amazon QuickSight, Tableau, PowerBI, QueryBook and Jupyter Notebook.
Simple API to interact with Amazon Redshift: Amazon Redshift lets you painlessly access data with all types of traditional, cloud-native, and containerized, serverless web services-based applications and event-driven applications. The Amazon Redshift Data API simplifies data access, ingest, and egress from programming languages and platforms supported by the AWS SDK, such as Python, Go, Java, Node.js, PHP, Ruby, and C++. The Data API eliminates the need for configuring drivers and managing database connections. Instead, you can run SQL commands to an Amazon Redshift cluster by simply calling a secured API endpoint provided by the Data API. The Data API takes care of managing database connections and buffering data. The Data API is asynchronous, so you can retrieve your results later. Your query results are stored for 24 hours.
Fault tolerant: There are multiple features that enhance the reliability of your data warehouse cluster. For example, Amazon Redshift continuously monitors the health of the cluster and automatically re-replicates data from failed drives and replaces nodes as necessary for fault tolerance. Clusters can also be relocated to alternative Availability Zones (AZs) without any data loss or application changes.
AWS has comprehensive security capabilities to satisfy the most demanding requirements, and Amazon Redshift provides data security out-of-the-box at no extra cost.
Granular access controls: Granular row and column level security controls ensure that users see only the data they should have access to. Amazon Redshift is integrated with AWS Lake Formation, ensuring that Lake Formation’s column level access controls are also enforced for Redshift queries on the data in the data lake.
Amazon Redshift data sharing supports centralized access control with AWS Lake formation to simplify governance of data shared from Amazon Redshift. AWS Lake Formation (LF) is a service that makes it easy to set up secure data lakes, to centrally manage granular access to data across all consuming services, and to apply row level and column level controls.
Dynamic Data Masking: With Dynamic Data Masking, customers can easily protect their sensitive data by limiting how much identifiable data is visible to users; and also, be able to define multiple levels of permissions on these fields so different users and groups can have varying levels of data access without having to create multiple copies of data, all through Redshift's familiar SQL interface.
Multi AZ: The new Redshift Multi-AZ configuration further expands the recovery capabilities by reducing recovery time and guaranteeing capacity to automatically recover with no data loss. A Redshift Multi-AZ data warehouse maximizes performance and value by delivering high availability without having to use standby resources.
End-to-end encryption: With just a few parameter settings, you can set up Amazon Redshift to use SSL to secure data in transit, and hardware-accelerated AES-256 encryption for data at rest. If you choose to enable encryption of data at rest, all data written to disk will be encrypted as well as any backups. Amazon Redshift takes care of key management by default.
Network isolation: Amazon Redshift lets you configure firewall rules to control network access to your data warehouse cluster. You can run Amazon Redshift inside Amazon Virtual Private Cloud (VPC) to isolate your data warehouse cluster in your own virtual network and connect it to your existing IT infrastructure using an industry-standard encrypted IPsec VPN.
Audit and compliance: Amazon Redshift integrates with AWS CloudTrail to enable you to audit all Redshift API calls. Redshift logs all SQL operations, including connection attempts, queries, and changes to your data warehouse. You can access these logs using SQL queries against system tables, or save the logs to a secure location in Amazon S3. Amazon Redshift is compliant with SOC1, SOC2, SOC3, and PCI DSS Level 1 requirements. For more details, visit AWS Cloud Compliance.
Tokenization: Amazon Lambda user-defined functions (UDFs) lets you use an AWS Lambda function as a UDF in Amazon Redshift and invoke it from Redshift SQL queries. With this functionality you can to write custom extensions for your SQL query to achieve tighter integration with other services or third-party products. You can write Lambda UDFs to enable external tokenization, data masking, identification or de-identification of data by integrating with vendors such as Protegrity, and protect or unprotect sensitive data based on a user’s permissions and groups, in query time.
Find out more what’s new.
Visit Amazon Redshift Documentation for more detailed product information.