Amazon Redshift Capabilities

Modern, scalable, secure, and high-performance cloud data warehousing to analyze all your data.

Achieve best price-performance at any scale

Meet your needs for a highly scalable, performant, and reliable modern cloud data warehouse to handle growing data for any number of concurrent users. With the best  price-performance for your workloads, Amazon Redshift runs on Massively Parallel Processing (MPP) architecture and RA3 instances that separate compute and storage. Run and scale any kind of analytics workloads cost effectively without managing data warehouse infrastructure using Amazon Redshift Serverless, complete with AI-driven scaling and optimizations. As you adjust to demanding analytics needs for your business, a reliable cloud data warehouse such as Amazon Redshift becomes imperative for you to minimize disruptions with its multi-Az deployments delivering 99.99% SLAs.

Unify all your data with zero-ETL approaches

Break through data silos in your organization and build an end to end data strategy to analyze all your data. Amazon Redshift employs a zero-ETL approach that enable interoperability and integration between the data warehouse, your Amazon S3 data lakes, operational and NoSQL databases such as Amazon Aurora, Amazon RDS, and Amazon DynamoDB, and even your streaming data services, so that the data is easily and automatically ingested into the warehouse for you or you can access the data in place. No more spending weeks or months building cumbersome and erroneous data pipelines to move the data from one system to another.

Maximize value with comprehensive analytics & ML

From executing SQL queries to building complex dashboards or near real-time and AI/Gen-AI applications, Amazon Redshift makes it easy for you to analyze all your data and drive your business forward. You can spin up a Redshift Serverless endpoint in seconds, use  Amazon Redshift’s Query Editor to load, analyze, visualize, and collaborate on your data, across data sources.  Submit query requests in plain English and receive custom SQL code recommendations based on your organizations schema metadata with Amazon Q generative SQL in Query Editor. Seamlessly go from data to predictive analytics with Amazon Redshift ML which uses familiar SQL to build, train, and deploy machine learning or forecasting models, right within the warehouse. 

Innovate faster with secure data collaboration

Share data securely across AWS regions, teams, and third-party data warehouses without data movement or data copying. In just a few clicks multiple teams can read and update shared data sets and collaborate on up-to-date data across regions, accounts, and even third party data warehouses. Data sharing is centrally governed by AWS Lake Formation. Have the confidence that your data is secure no matter where you operate or how highly regulated your industries are. Amazon Redshift enables fine grained access controls such as role-based access controls, row and column level security and an easy authentication experience with single sign-on for your organizational identity – all provided at no additional cost to you.

Best Price-Performance at any scale

RA3 instances maximize the speed for performance-intensive workloads that require large amounts of compute capacity, with the flexibility to pay separately for compute independently of storage by specifying the number of instances you need.

Run analytics in seconds and scale without the need to set up and manage data warehouse infrastructure. AI-driven scaling and optimization technology (available in preview) enables Redshift Serverless to automatically and proactively provision and scale data warehouse capacity, delivering fast performance for even the most demanding workloads. The system uses AI techniques to learn customer workload patterns across key dimensions, such as concurrent queries, query complexity, influx of data volume, and ETL patterns. It then continuously adjusts resources throughout the day and applies tailored performance optimizations. You can set a desired performance target, and the data warehouse automatically scales to maintain consistent performance.

Columnar storage, data compression, and zone maps reduce the amount of I/O needed to perform queries. Along with the industry-standard encodings such as LZO and Zstandard, Amazon Redshift also offers purpose-built compression encoding, AZ64, for numeric and date/time types to provide both storage savings and optimized query performance.

supports virtually unlimited concurrent users and concurrent queries with consistent service levels by adding transient capacity in seconds as concurrency increases. Scale with minimal cost impact, as each cluster earns up to one hour of free Concurrency Scaling credits per day. These free credits are sufficient for the concurrency needs of 97% of customers.

Start writing to Redshift databases from other Redshift data warehouses in just a few clicks, further enabling data collaboration, flexible scaling of compute for ETL/data processing workloads by adding warehouses of different types and sizes based on price-performance needs. Experience greater transparency of compute usage as each warehouse is billed for its own compute and consequently keep your costs under control.

Amazon Redshift materialized views allow you to achieve significantly faster query performance for iterative or predictable analytical workloads such as dashboarding and queries from Business Intelligence (BI) tools, and extract, transform and load ELT data processing jobs. You can use materialized views to easily store and manage precomputed results of a SELECT statement that may reference one or more tables, including external tables. 

Deliver sub-second response times for repeat queries. Dashboard, visualization, and business intelligence tools that run repeat queries experience a significant performance boost. When a query runs, Amazon Redshift searches the cache to see if there is a cached result from a prior run. If a cached result is found and the data has not changed, the cached result is returned immediately instead of re-running the query.

Sophisticated algorithms to predict and classify incoming queries based on their run times and resource requirements to dynamically manage performance and concurrency while also helping you prioritize your business-critical workloads. Short query acceleration (SQA) sends short queries from applications such as dashboards to an express queue for immediate processing rather than being starved behind large queries. Automatic workload management (WLM) uses ML to dynamically manage memory and concurrency, helping maximize query throughput. In addition, you can now easily set the priority of your most important queries, even when hundreds of queries are being submitted. Redshift Advisor makes recommendations when an explicit user action is needed to further turbocharge Redshift performance. For dynamic workloads where query patterns are not predictable, Automated Materialized Views improve throughput of queries, lower query latency, shorten execution time through automatic refresh, auto query rewrite, incremental refresh, and continuous monitoring of Amazon Redshift clusters. Automatic Table Optimization selects the best sort and distribution keys to optimize performance for the cluster’s workload. If Amazon Redshift determines that applying a key will improve cluster performance, tables will be automatically altered without requiring administrator intervention. The additional features Automatic Vacuum Delete, Automatic Table Sort, and Automatic Analyze eliminate the need for manual maintenance and tuning of Redshift clusters to get the best performance for new clusters and production workloads. 

A new powerful table sorting mechanism that improves performance of repetitive queries by  automatically sorting data based on the incoming query filters (for example: Sales in a specific region). This method significantly accelerates the performance of table scans compared to traditional methods.

Expand recovery capabilities by reducing recovery time and guaranteeing capacity to automatically recover with no data loss. A Redshift Multi-AZ data warehouse maximizes performance and value by delivering high availability without having to use standby resources.

With Dynamic Data Masking, customers can easily protect their sensitive data by limiting how much identifiable data is visible to users; and also, be able to define multiple levels of permissions on these fields so different users and groups can have varying levels of data access without having to create multiple copies of data, all through Redshift's familiar SQL interface.

Granular row and column level security controls ensure that users see only the data they should have access to. Amazon Redshift is integrated with AWS Lake Formation, ensuring that Lake Formation’s column level access controls are also enforced for Redshift queries on the data in the data lake. Amazon Redshift data sharing supports centralized access control with AWS Lake formation to simplify governance of data shared from Amazon Redshift. AWS Lake Formation (LF) is a service that makes it easy to set up secure data lakes, to centrally manage granular access to data across all consuming services, and to apply row level and column level controls.

With just a few parameter settings, you can set up Amazon Redshift to use SSL to secure data in transit, and hardware-accelerated AES-256 encryption for data at rest. If you choose to enable encryption of data at rest, all data written to disk will be encrypted as well as any backups. Amazon Redshift takes care of key management by default.

Amazon Redshift lets you configure firewall rules to control network access to your data warehouse cluster. You can run Amazon Redshift inside Amazon Virtual Private Cloud (VPC) to isolate your data warehouse cluster in your own virtual network and connect it to your existing IT infrastructure using an industry-standard encrypted IPsec VPN.

Unify all your data with a zero-ETL approach

No-code integration between Amazon Aurora, Amazon RDS, and Amazon DynamoDB and Amazon Redshift bring near real-time analytics and machine learning on petabytes of data in these databases. For example, within seconds of transactional data being written into Amazon Aurora, Amazon Aurora Zero-ETL to Amazon Redshift seamlessly makes the data available in Amazon Redshift, eliminating the need for customers to build and maintain complex data pipelines performing extract, transform, and load (ETL) operations.

 

Query live data across one or more Amazon Relational Database Service (RDS), Aurora PostgreSQL, RDS MySQL, and Aurora MySQL databases to get instant visibility into the full business operations without requiring data movement.

Amazon Redshift supports read only queries on Apache Iceberg, Apache Hudi and Delta Lake table formats. Apache Hudi, Apache Iceberg and Delta Lake are open-source table formats that aims to improve the support transactional consistency while providing more flexibility with improved performance and simplified management on data lake tables, specifically with update/delete heavy workloads.

Query open file formats such as Parquet, ORC, JSON, Avro, CSV, and more directly in Amazon S3 using familiar ANSI SQL. To export data to your data lake, simply use the Amazon Redshift UNLOAD command in your SQL code and specify Parquet as the file format, and Amazon Redshift automatically takes care of data formatting and data movement into S3. This gives you the flexibility to store highly structured, frequently accessed data and semi-structured data in an Amazon Redshift data warehouse, while keeping up to exabytes of structured, semi-structured, and unstructured data in Amazon S3. Exporting data from Amazon Redshift back to your data lake lets you analyze the data further with AWS services such as Amazon Athena, Amazon EMR, and Amazon SageMaker.

Use SQL (Structured Query Language) to connect to and directly ingest data from Amazon Kinesis Data Streams and Amazon Managed Streaming for Apache Kafka (MSK). Amazon Redshift Streaming Ingestion also makes it easy to create and manage downstream pipelines by letting you create materialized views on top of streams directly. The materialized views can also include SQL transformations as part of your ELT (Extract Load Transform) pipeline. You can manually refresh defined materialized views to query the most recent streaming data.

Simplify and automate data loading from Amazon S3 reducing time and effort to build custom solutions or manage 3rd party services. With this feature, Amazon Redshift eliminates the need for manually and repeatedly running copy procedures by automating file ingestion and taking care of continuous data loading steps under the hood. Support for auto-copy makes it easy for line-of-business users and data analysts without any data engineering knowledge to easily create ingestion rules and configure the location of the data they wish to load from Amazon S3.

Use SQL to make your Amazon Redshift data and data lake more accessible to data analysts, data engineers, and other SQL users with a web-based analyst workbench for data exploration and analysis. Query Editor lets you visualize query results in a single click, create schemas and tables, load data visually, and browse database objects. It also provides an intuitive editor for authoring and sharing SQL queries, analyses, visualizations, and annotations, and securely sharing them with your team.

Maximize value with comprehensive analytics and ML

run queries within the console or connect SQL client tools, libraries, or data science tools including Amazon QuickSight, Tableau, PowerBI, QueryBook and Jupyter Notebook.

Simple API to interact with Amazon Redshift: Amazon Redshift lets you painlessly access data with all types of traditional, cloud-native, and containerized, serverless web services-based applications and event-driven applications. The Amazon Redshift Data API simplifies data access, ingest, and egress from programming languages and platforms supported by the AWS SDK, such as Python, Go, Java, Node.js, PHP, Ruby, and C++. The Data API eliminates the need for configuring drivers and managing database connections. Instead, you can run SQL commands to an Amazon Redshift cluster by simply calling a secured API endpoint provided by the Data API. The Data API takes care of managing database connections and buffering data. The Data API is asynchronous, so you can retrieve your results later. Your query results are stored for 24 hours.

Redshift ML makes it easy for data analysts, data scientists, BI professionals, and developers to create, train, and deploy Amazon SageMaker models using SQL. With Redshift ML, you can use SQL statements to create and train Amazon SageMaker models on your data in Amazon Redshift and then use those models for predictions such as churn detection, financial forecasting, personalization, and risk scoring directly in your queries and reports. Learn more.

Securely write query requests in plain English directly within Amazon Redshift Query Editor within the scope of your current data access permissions and receive accurate SQL code recommendations.

Build and run Apache Spark applications on Amazon Redshift data, enabling customers to open up the data warehouse for a broader set of analytics and machine learning solutions. Developers using AWS analytics and ML services such as Amazon EMR, AWS Glue, Amazon Athena Spark, and Amazon SageMaker can effortlessly build Apache Spark applications that read from and write to their Amazon Redshift data warehouse without compromising on performance of the applications or transactional consistency of the data. 

Query data and write data back to your data lake in open formats. You can query open file formats such as Parquet, ORC, JSON, Avro, CSV, and more directly in Amazon S3 using familiar ANSI SQL. To export data to your data lake, simply use the Amazon Redshift UNLOAD command in your SQL code and specify Parquet as the file format, and Amazon Redshift automatically takes care of data formatting and data movement into S3. This gives you the flexibility to store highly structured, frequently accessed data and semi-structured data in an Amazon Redshift data warehouse, while keeping up to exabytes of structured, semi-structured, and unstructured data in Amazon S3. 

Innovate faster with secure data collaboration

Extend the ease of use, performance, and cost benefits of Amazon Redshift in a single cluster to multi-cluster deployments while being able to share data. Data sharing enables instant, granular, and fast data access across Redshift clusters without the need to copy or move it.

Query Amazon Redshift datasets from your own Redshift cluster without extracting, transforming, and loading (ETL) the data. You can subscribe to Redshift cloud data warehouse products in AWS Data Exchange. As soon as a provider makes an update, the change is visible to subscribers. 

Integration with AWS IAM Identity Center enables organizations to support trusted identity propagation between Amazon Redshift, Amazon QuickSight, and AWS Lake Formation. Customers can use their organization identities to access Amazon Redshift in a single sign-on experience using third party identity providers (IdP), such as Microsoft Entra ID, Okta, Ping, OneLogin, etc. from Amazon QuickSight and Amazon Redshift Query Editor. Administrators can use third-party identity provider users and groups to manage fine grained access to data across services and audit user level access in AWS CloudTrail. With trusted identity propagation, a user’s identity is passed seamlessly between Amazon QuickSight, Amazon Redshift, and AWS Lake Formation reducing time to insights and enabling a friction free analytics experience.

Write to the same databases using multiple data warehouses, without moving or copying the data.

Accelerate data onboarding and create valuable business insights in minutes by integrating with select partner solutions in the Amazon Redshift console. With these solutions you can bring data from applications such as Salesforce, Google Analytics, Facebook Ads, Slack, Jira, Splunk, and Marketo into your Redshift data warehouse in an efficient and streamlined way. It also lets you join these disparate datasets and analyze them together to produce actionable insights.