Amazon Redshift accelerates your time to insights with fast, easy, and secure cloud data warehousing at scale
Features and benefits
Each year we release hundreds of features and product improvements, driven by customer use cases and feedback. Find out more about what’s new.
Easy analytics for everyone
Focus on getting from data to insights in seconds and delivering on your business outcomes, without worrying about managing your data warehouse.
Amazon Redshift Serverless (preview): Amazon Redshift Serverless (preview) is a serverless option of Amazon Redshift that makes it easy to run and scale analytics in seconds without the need to set up and manage data warehouse infrastructure. With Redshift Serverless, any user—including data analysts, developers, business professionals, and data scientists—can get insights from data by simply loading and querying data in the data warehouse. Learn more.
Query Editor v2: Make your data in Amazon Redshift and your data lake more accessible to data analysts, data engineers, and other SQL users with a web-based analyst workbench for data exploration and analysis using SQL. Query Editor v2 lets you visualize query results in a single click, create schemas and tables, load data visually, and browse database objects. It also provides an intuitive editor for authoring SQL queries and sharing queries, analyses, visualizations, and annotations and securely sharing them with your team.
Automated Table Design: Amazon Redshift monitors user workloads and uses sophisticated algorithms to find ways to improve the physical layout of data to optimize query speeds. Automatic Table Optimization selects the best sort and distribution keys to optimize performance for the cluster’s workload. If Amazon Redshift determines that applying a key will improve cluster performance, tables will be automatically altered without requiring administrator intervention. The additional features Automatic Vacuum Delete, Automatic Table Sort, and Automatic Analyze eliminate the need for manual maintenance and tuning of Redshift clusters to get the best performance for new clusters and production workloads.
Query using your own tools: Amazon Redshift gives you the flexibility to run queries within the console or connect SQL client tools, libraries, or data science tools including Amazon Quicksight, Tableau, PowerBI, QueryBook and Jupyter Notebook.
Simple API to interact with Amazon Redshift: Amazon Redshift lets you painlessly access data with all types of traditional, cloud-native, and containerized, serverless web services-based applications and event-driven applications. The Amazon Redshift Data API simplifies data access, ingest, and egress from programming languages and platforms supported by the AWS SDK, such as Python, Go, Java, Node.js, PHP, Ruby, and C++. The Data API eliminates the need for configuring drivers and managing database connections. Instead, you can run SQL commands to an Amazon Redshift cluster by simply calling a secured API endpoint provided by the Data API. The Data API takes care of managing database connections and buffering data. The Data API is asynchronous, so you can retrieve your results later. Your query results are stored for 24 hours.
Fault tolerant: There are multiple features that enhance the reliability of your data warehouse cluster. For example, Amazon Redshift continuously monitors the health of the cluster and automatically re-replicates data from failed drives and replaces nodes as necessary for fault tolerance. Clusters can also be relocated to alternative Availability Zones (AZs) without any data loss or application changes.
Analyze all your data
Get integrated insights running real-time and predictive analytics on complex, scaled data across your operational databases, data lake, data warehouse and thousands of third-party data sets.
Federated query: With the new federated query capability in Amazon Redshift, you can reach into your operational relational database. Query live data across one or more Amazon Relational Database Service (RDS) and Aurora PostgreSQL and RDS MySQL and Aurora MySQL databases to get instant visibility into the full business operations without requiring data movement. You can join data from your Redshift data warehouse, data in your data lake, and data in your operational stores to make better data-driven decisions. Amazon Redshift offers sophisticated optimizations to reduce data moved over the network and complements it with its massively parallel data processing for high-performance queries. Learn more.
Query and export data to and from your data lake: No other cloud data warehouse makes it as easy to both query data and write data back to your data lake in open formats. You can query open file formats such as Parquet, ORC, JSON, Avro, CSV, and more directly in S3 using familiar ANSI SQL. To export data to your data lake, simply use the Amazon Redshift UNLOAD command in your SQL code and specify Parquet as the file format, and Amazon Redshift automatically takes care of data formatting and data movement into S3. This gives you the flexibility to store highly structured, frequently accessed data and semi-structured data in an Amazon Redshift data warehouse, while keeping up to exabytes of structured, semi-structured and unstructured data in Amazon S3. Exporting data from Amazon Redshift back to your data lake lets you analyze the data further with AWS services such as Amazon Athena, Amazon EMR, and Amazon SageMaker.
AWS services integration: Native integration with AWS services, database, and machine learning services makes it easier to handle complete analytics workflows without friction. For example, AWS Lake Formation is a service that makes it easy to set up a secure data lake in days. AWS Glue can extract, transform, and load (ETL) data into Amazon Redshift. Amazon Kinesis Data Firehose is the easiest way to capture, transform, and load streaming data into Amazon Redshift for near real-time analytics. You can use Amazon EMR to process data using Hadoop/Spark and load the output into Amazon Redshift for BI and analytics. Amazon QuickSight is the first BI service with pay-per-session pricing that you can use to create reports, visualizations, and dashboards on Redshift data. You can use Amazon Redshift to prepare your data to run machine learning (ML) workloads with Amazon SageMaker. To accelerate migrations to Amazon Redshift, you can use the AWS Schema Conversion tool and the AWS Database Migration Service (DMS). Amazon Redshift is also deeply integrated with Amazon Key Management Service (KMS) and Amazon CloudWatch for security, monitoring, and compliance. You can also use Lambda UDFs to invoke a Lambda function from your SQL queries as if you are invoking a User Defined Function in Amazon Redshift. You can write Lambda UDFs to integrate with AWS Partner services and to access other popular AWS services such as Amazon DynamoDB and Amazon SageMaker.
Partner console integration: You can accelerate data onboarding and create valuable business insights in minutes by integrating with select Partner solutions in the Amazon Redshift console. With these solutions you can bring data from applications such as Salesforce, Google Analytics, Facebook Ads, Slack, Jira, Splunk, and Marketo into your Redshift data warehouse in an efficient and streamlined way. It also lets you join these disparate datasets and analyze them together to produce actionable insights.
Data Sharing: Amazon Redshift data sharing allows you to extend the ease of use, performance, and cost benefits of Amazon Redshift in a single cluster to multi-cluster deployments while being able to share data. Data sharing enables instant, granular, and fast data access across Redshift clusters without the need to copy or move it. Data sharing provides live access to data so your users always see the most current and consistent information as it’s updated in the data warehouse. You can securely share live data with Redshift clusters in the same or different AWS accounts and across Regions. Learn more.
AWS Data Exchange for Amazon Redshift: Query Amazon Redshift datasets from your own Redshift cluster without extracting, transforming, and loading ETL the data. You can subscribe to Redshift cloud data warehouse products in AWS Data Exchange. As soon as a provider makes an update, the change is visible to subscribers. If you are a data provider, access is automatically granted when a subscription starts and revoked when it ends, invoices are automatically generated when payments are due, and payments are collected through AWS. You can license access to flat files, data in Amazon Redshift, and data delivered through APIs, all with a single subscription. Learn more.
Redshift ML: Redshift ML makes it easy for data analysts, data scientists, BI professionals, and developers to create, train, and deploy Amazon SageMaker models using SQL. With Redshift ML, you can use SQL statements to create and train Amazon SageMaker models on your data in Amazon Redshift and then use those models for predictions such as churn detection, financial forecasting, personalization, and risk scoring directly in your queries and reports. Learn more.
Native support for advanced analytics: Amazon Redshift supports standard scalar data types such as NUMBER, VARCHAR, and DATETIME and provides native support for the following advanced analytics processing:
- Spatial data processing: Amazon Redshift provides a polymorphic data type, GEOMETRY, that supports multiple geometric shapes such as Point, Linestring, and Polygon. Amazon Redshift also provides spatial SQL functions to construct geometric shapes, import, export, access, and process the spatial data. You can add GEOMETRY columns to Redshift tables and write SQL queries spanning across spatial and non-spatial data. This capability lets you store, retrieve, and process spatial data and seamlessly enhance your business insights by integrating spatial data into your analytical queries. With Amazon Redshift’s ability to seamlessly query data lakes, you can also easily extend spatial processing to data lakes by integrating external tables in spatial queries. See the documentation for more details.
- HyperLogLog sketches: HyperLogLog is a novel algorithm that efficiently estimates the approximate number of distinct values in a dataset. HLL sketch is a construct that encapsulates the information about the distinct values in the dataset. You can use HLL sketches to achieve significant performance benefits for queries that compute approximate cardinality over large datasets, with an average relative error of 0.01–0.6%. Amazon Redshift provides a first-class datatype HLLSKETCH and associated SQL functions to generate, persist, and combine HyperLogLog sketches. The Amazon Redshift HyperLogLog capability uses bias correction techniques and provides high accuracy with low memory footprint. See the documentation for more details.
- DATE & TIME data types: Amazon Redshift provides multiple data types DATE, TIME, TIMETZ, TIMESTAMP, and TIMESTAMPTZ to natively store and process data/time data. TIME and TIMESTAMP types store the time data without time zone information, whereas TIMETZ and TIMESTAMPTZ types store the time data including the time zone information. You can use various date/time SQL functions to process the date and time values in Redshift queries. See the documentation for more details.
- Semi-structured data processing: The Amazon Redshift SUPER data type natively stores JSON and other semi-structured data in Redshift tables, and uses the PartiQL query language to seamlessly process the semi-structured data. The SUPER data type is schema-less in nature and allows storage of nested values that may contain Redshift scalar values, nested arrays, and nested structures. PartiQL is an extension of SQL and provides powerful querying capabilities such as object and array navigation, un-nesting of arrays, dynamic typing, and schema-less semantics. This lets you achieve advanced analytics that combine the classic structured SQL data with the semi-structured SUPER data with superior performance, flexibility, and ease of use. See the documentation for more details.
- Integration with third-party tools: There are many options to enhance Amazon Redshift by working with industry-leading tools and experts for loading, transforming, and visualizing data. Our extensive list of Partners have certified their solutions to work with Amazon Redshift.
- Load and transform your data with Data Integration Partners.
- Analyze data and share insights across your organization with Business Intelligence Partners.
- Architect and implement your analytics platform with System Integration and Consulting Partners.
- Query, explore, and model your data using tools and utilities from Query and Data Modeling Partners.
Performance at any scale
Gain up to 3x better price performance than other cloud data warehouses with automated optimizations to improve query speed.
RA3 instances: RA3 instances deliver up to 3 times better price performance of any cloud data warehouse service. These Amazon Redshift instances maximize speed for performance-intensive workloads that require large amounts of compute capacity, with the flexibility to pay separately for compute independently of storage by specifying the number of instances you need. Learn more.
Advanced Query Accelerator (AQUA) for Amazon Redshift: AQUA is a new distributed and hardware-accelerated cache that enables Amazon Redshift to run up to 10 times faster than other enterprise cloud data warehouses by automatically boosting certain types of queries. AQUA uses high-speed solid state storage, field-programmable gate arrays (FPGAs), and AWS Nitro to speed queries that scan, filter, and aggregate large datasets. AQUA is included with the Redshift RA3 instance type at no additional cost. Learn more.
Efficient storage and high-performance query processing: Amazon Redshift delivers fast query performance on datasets ranging in size from gigabytes to petabytes. Columnar storage, data compression, and zone maps reduce the amount of I/O needed to perform queries. Along with the industry-standard encodings such as LZO and Zstandard, Amazon Redshift also offers purpose-built compression encoding, AZ64, for numeric and date/time types to provide both storage savings and optimized query performance.
Limitless concurrency: Amazon Redshift provides consistently fast performance, even with thousands of concurrent queries, whether they query data in your Redshift data warehouse or directly in your Amazon S3 data lake. Amazon Redshift Concurrency Scaling supports virtually unlimited concurrent users and concurrent queries with consistent service levels by adding transient capacity in seconds as concurrency increases. Learn more.
Materialized views: Amazon Redshift materialized views allow you to achieve significantly faster query performance for iterative or predictable analytical workloads such as dashboarding and queries from Business Intelligence (BI) tools, and extract, transform and load (ELT) data processing jobs. You can use materialized views to easily store and manage precomputed results of a SELECT statement that may reference one or more tables, including external tables. Subsequent queries referencing the materialized views can run much faster by reusing the precomputed results. Amazon Redshift can efficiently maintain the materialized views incrementally to continue to provide the low latency performance benefits. Learn more.
Machine learning to maximize throughput and performance: Advanced ML capabilities in Amazon Redshift deliver high throughput and performance, even with varying workloads or concurrent user activity. Amazon Redshift uses sophisticated algorithms to predict and classify incoming queries based on their run times and resource requirements to dynamically manage performance and concurrency while also helping you prioritize your business-critical workloads. Short query acceleration (SQA) sends short queries from applications such as dashboards to an express queue for immediate processing rather than being starved behind large queries. Automatic workload management (WLM) uses ML to dynamically manage memory and concurrency, helping maximize query throughput. In addition, you can now easily set the priority of your most important queries, even when hundreds of queries are being submitted. Amazon Redshift is also a self-learning system that observes the user workload, determining the opportunities to improve performance as the usage grows, applying optimizations seamlessly, and making recommendations through Redshift Advisor when an explicit user action is needed to further turbocharge Redshift performance.
Result caching: Amazon Redshift uses result caching to deliver sub-second response times for repeat queries. Dashboard, visualization, and business intelligence tools that run repeat queries experience a significant performance boost. When a query runs, Amazon Redshift searches the cache to see if there is a cached result from a prior run. If a cached result is found and the data has not changed, the cached result is returned immediately instead of re-running the query.
Petabyte-scale data warehousing: With a few clicks in the console or a simple API call, you can easily change the number or type of nodes in your data warehouse, and scale up or down as your needs change. With managed storage, capacity is added automatically to support workloads up to 8 PB of compressed data. You can also run queries against petabytes of data in Amazon S3 without having to load or transform any data with the Amazon Redshift Spectrum feature. You can use S3 as a highly available, secure, and cost-effective data lake to store unlimited data in open data formats. Redshift Spectrum runs queries across thousands of parallelized nodes to deliver fast results, regardless of the complexity of the query or the amount of data.
Flexible pricing options: Amazon Redshift is the most cost-effective data warehouse, and you can optimize how you pay. You can start small for just $0.25 per hour with no commitments, and scale out for just $1,000 per terabyte per year. Amazon Redshift is the only cloud data warehouse that offers on-demand pricing with no upfront costs, Reserved Instance pricing that can save you up to 75% by committing to a one- or three-year term, and per-query pricing based on the amount of data scanned in your Amazon S3 data lake. Amazon Redshift’s pricing includes built-in security, data compression, backup storage, and data transfer. As the size of data grows, you use managed storage in the RA3 instances to store data cost-effectively at $0.024 per GB per month.
Predictable cost, even with unpredictable workloads: Amazon Redshift allows you to scale with minimal cost impact, as each cluster earns up to one hour of free Concurrency Scaling credits per day. These free credits are sufficient for the concurrency needs of 97% of customers. This provides you with predictability in your month-to-month cost, even during periods of fluctuating analytical demand.
Choose your node type to get the best value for your workloads: You can select from three instance types to optimize Amazon Redshift for your data warehousing needs: RA3 nodes, Dense Compute nodes, and Dense Storage nodes.
RA3 nodes let you scale storage independently of compute. With RA3, you get a high-performance data warehouse that stores data in a separate storage layer. You only need to size the data warehouse for the query performance that you need.
Dense Compute (DC) nodes allow you to create very high-performance data warehouses using fast CPUs, large amounts of RAM, and solid-state disks (SSDs) and are the best choice for less than 500 GB of data.
Dense Storage (DS2) nodes let you create large data warehouses using hard disk drives (HDDs) for a low price point when you purchase the three-year Reserved Instances. Most customers who run on DS2 clusters can migrate their workloads to RA3 clusters and get up to twice the performance and more storage for the same cost as DS2.
Scaling your cluster or switching between node types requires a single API call or a few clicks in the AWS Management Console. Visit the pricing page for more information.
Most secure and compliant
AWS has comprehensive security capabilities to satisfy the most demanding requirements, and Amazon Redshift provides data security out-of-the-box at no extra cost.
End-to-end encryption: With just a few parameter settings, you can set up Amazon Redshift to use SSL to secure data in transit, and hardware-accelerated AES-256 encryption for data at rest. If you choose to enable encryption of data at rest, all data written to disk will be encrypted as well as any backups. Amazon Redshift takes care of key management by default.
Network isolation: Amazon Redshift lets you configure firewall rules to control network access to your data warehouse cluster. You can run Amazon Redshift inside Amazon Virtual Private Cloud (VPC) to isolate your data warehouse cluster in your own virtual network and connect it to your existing IT infrastructure using an industry-standard encrypted IPsec VPN.
Audit and compliance: Amazon Redshift integrates with AWS CloudTrail to enable you to audit all Redshift API calls. Redshift logs all SQL operations, including connection attempts, queries, and changes to your data warehouse. You can access these logs using SQL queries against system tables, or save the logs to a secure location in Amazon S3. Amazon Redshift is compliant with SOC1, SOC2, SOC3, and PCI DSS Level 1 requirements. For more details, visit AWS Cloud Compliance.
Tokenization: Amazon Lambda user-defined functions (UDFs) lets you use an AWS Lambda function as a UDF in Amazon Redshift and invoke it from Redshift SQL queries. With this functionality you can to write custom extensions for your SQL query to achieve tighter integration with other services or third-party products. You can write Lambda UDFs to enable external tokenization, data masking, identification or de-identification of data by integrating with vendors such as Protegrity, and protect or unprotect sensitive data based on a user’s permissions and groups, in query time.
Granular access controls: Granular row and column level security controls ensure that users see only the data they should have access to. Amazon Redshift is integrated with AWS Lake Formation, ensuring that Lake Formation’s column level access controls are also enforced for Redshift queries on the data in the data lake.