Amazon Redshift Documentation
Amazon Redshift is designed to accelerate your time to insights with cloud data warehousing at scale.
Analyze your data
Federated Query: With the federated query capability in Redshift, you can reach into your operational, relational databases. You can query live data across one or more Amazon Relational Database Service (RDS), Aurora PostgreSQL, RDS MySQL, and Aurora MySQL databases. You can join data from your Redshift data warehouses, data in your data lakes, and data in your operational stores. Redshift offers optimizations to reduce data movement over the network, as well as parallel data processing.
Data sharing: Amazon Redshift data sharing is designed to allow you to extend Amazon Redshift in a single cluster to multi-cluster deployments while being able to share data. Data sharing is designed to enable fast data access across Redshift clusters without the need to copy or move it. Data sharing is designed to provide live access to data so your users can see information as it’s updated in the data warehouse. You can share live data with Redshift clusters in the same or different AWS accounts and across regions.
AWS Data Exchange for Amazon Redshift: You can query Amazon Redshift datasets from your own Redshift cluster without extracting, transforming, and loading (ETL) the data. You can subscribe to Redshift cloud data warehouse products in AWS Data Exchange. AWS Data Exchange is designed so that if a provider makes an update, the change is visible to subscribers. It is also designed so that if you are a data provider, access is granted when a subscription starts and revoked when it ends, invoices are generated when payments are due, and payments are collected through AWS. It is designed so that you can license access to flat files, data in Amazon Redshift, and data delivered through APIs, all with a single subscription.
Redshift ML: Amazon Redshift ML is designed to enable data analysts, data scientists, BI professionals, and developers to create, train, and deploy Amazon SageMaker models using SQL. With Redshift ML, you can use SQL statements to create and train Amazon SageMaker models on your data in Amazon Redshift and then use those models for predictions such as churn detection, financial forecasting, personalization, and risk scoring directly in your queries and reports.
Amazon Redshift Integration for Apache Spark: This feature enables you to build and run Apache Spark applications on Amazon Redshift data. With Amazon Redshift Integration for Apache Spark, developers using AWS analytics and ML services such as Amazon EMR, AWS Glue, Amazon Athena Spark, and Amazon SageMaker can build Apache Spark applications that read from and write to their Amazon Redshift data warehouse. Amazon Redshift Integration for Apache Spark is designed to allow you to monitor and troubleshoot performance issues of Apache Spark applications when using with Amazon Redshift.
Amazon Aurora zero-ETL integration with Amazon Redshift: This is a no-code integration between Amazon Aurora and Amazon Redshift that is designed to enable Amazon Aurora customers to use Amazon Redshift for near real-time analytics and machine learning on large amounts of transactional data. The feature is designed so that within seconds of transactional data being written into Amazon Aurora, the data is available in Amazon Redshift, making it unnecessary for customers to build and maintain complex data pipelines performing extract, transform, and load (ETL) operations. This integration is designed to reduce operational burden and cost, and enable customers to focus on improving their applications.
Streaming Ingestion: With the new streaming ingestion capability in Amazon Redshift, you can use SQL (Structured Query Language) to connect to and directly ingest data from Amazon Kinesis Data Streams and Amazon Managed Streaming for Apache Kafka (MSK). Amazon Redshift Streaming Ingestion is designed to allow you to create and manage downstream pipelines by letting you create materialized views on top of streams directly. The materialized views can also include SQL transformations as part of your ELT (Extract Load Transform) pipeline. You can manually refresh defined materialized views to query the most recent streaming data.
Query and export data to and from your data lake: You can both query data and write data back to your data lake in open formats. You can query open file formats such as Parquet, ORC, JSON, Avro, CSV, and more directly in Amazon S3 using familiar ANSI SQL. To export data to your data lake, you can use the Amazon Redshift UNLOAD command in your SQL code and specify Parquet as the file format, and Amazon Redshift is designed to format data and move it into Amazon S3. This is designed to give you the flexibility to store highly structured, frequently accessed data and semi-structured data in an Amazon Redshift data warehouse, while also keeping structured, semi-structured, and unstructured data in Amazon S3. Exporting data from Amazon Redshift back to your data lake helps you to analyze the data further with AWS services like Amazon Athena, Amazon EMR, and Amazon SageMaker.
AWS services integration: Native integration with AWS analytics, database, and machine learning services is designed to make it easier to handle analytics workflows. For example, AWS Lake Formation is a service that helps set up a secure data lake. AWS Glue can extract, transform, and load (ETL) data into Amazon Redshift. Amazon Kinesis Data Firehose can help you to capture, transform, and load streaming data into Amazon Redshift for analytics. Amazon EMR is designed to process data using Hadoop/Spark and load the output into Amazon Redshift for BI and analytics. Amazon QuickSight is the BI service that you can use to create reports, visualizations, and dashboards on Amazon Redshift data. You can use Amazon Redshift to prepare your data to run machine learning (ML) workloads with Amazon SageMaker. To accelerate migrations to Amazon Redshift, you can use the AWS Schema Conversion tool and the AWS Database Migration Service (DMS). Amazon Redshift is integrated with Amazon Key Management Service (KMS) and Amazon CloudWatch for security, monitoring, and compliance. You can also use Lambda user-defined functions (UDFs) to invoke a Lambda function from your SQL queries as if you are invoking a User Defined Function in Redshift. You can write Lambda UDFs to integrate with AWS Partner services and to access other AWS services such as Amazon DynamoDB and Amazon SageMaker.
Partner console integration: You can integrate with select partner solutions in the Amazon Redshift console. With these solutions you can bring data from applications such as Salesforce, Google Analytics, Facebook Ads, Slack, Jira, Splunk, and Marketo into your Redshift data warehouse. It is also designed to let you join these datasets and analyze them together to produce insights.
Native support for advanced analytics: Amazon Redshift supports standard scalar data types such as NUMBER, VARCHAR, and DATETIME and provides native support for the following advanced analytics processing:
• Spatial data processing: Amazon Redshift provides a polymorphic data type, GEOMETRY, which supports multiple geometric shapes such as Point, Linestring, and Polygon. Amazon Redshift also provides spatial SQL functions to construct geometric shapes, import, export, access and process the spatial data. You can add GEOMETRY columns to Amazon Redshift tables and write SQL queries spanning across spatial and non-spatial data. This capability enables you to store, retrieve, and process spatial data and integrates spatial data into your analytical queries. With Amazon Redshift’s ability to query data lakes, you can also extend spatial processing to data lakes by integrating external tables in spatial queries.
• HyperLogLog sketches: HyperLogLog is an algorithm that estimates the approximate number of distinct values in a data set. HLL sketch is a construct that encapsulates the information about the distinct values in the data set. Amazon Redshift provides datatype HLLSKETCH and associated SQL functions to generate, persist, and combine HyperLogLog sketches. The Amazon Redshift HyperLogLog capability uses bias correction techniques and is designed to provide high accuracy with low memory footprint.
• DATE & TIME data types: Amazon Redshift is designed to provide multiple data types DATE, TIME, TIMETZ, TIMESTAMP and TIMESTAMPTZ to natively store and process data/time data. TIME and TIMESTAMP types store the time data without time zone information, whereas TIMETZ and TIMESTAMPTZ types store the time data including the timezone information. You can use various date/time SQL functions to process the date and time values in Amazon Redshift queries.
• Semi-structured data processing: The Amazon Redshift SUPER data type is designed to natively store JSON and other semi-structured data in Amazon Redshift tables, and uses the PartiQL query language to process the semi-structured data. The SUPER data type is schemaless in nature and allows storage of nested values that may contain Amazon Redshift scalar values, nested arrays and nested structures. PartiQL is an extension of SQL and provides querying capabilities such as object and array navigation, unnesting of arrays, dynamic typing, and schemaless semantics. This can help you to achieve advanced analytics that combine the classic structured SQL data with the semi-structured SUPER data.
• Integration with third-party tools: There are many options to enhance Amazon Redshift by working with industry-leading tools and experts for loading, transforming, and visualizing data. Our Partners have certified their solutions to work with Amazon Redshift.
• Load and transform your data with Data Integration Partners.
• Analyze data and share insights across your organization with Business Intelligence Partners.
• Architect and implement your analytics platform with System Integration and Consulting Partners.
• Query, explore, and model your data using tools and utilities from Query and Data Modeling Partners.
Performance at a Scale
RA3 instances: RA3 instances are designed to improve speed for performance-intensive workloads that require large amounts of compute capacity, with the flexibility to pay separately for compute independently of storage by specifying the number of instances you need.
Storage and query processing: Amazon Redshift is designed to provide fast query performance on datasets of varying sizes. Columnar storage, data compression, and zone maps are designed to reduce the amount of I/O needed to perform queries. Along with the encodings such as LZO and Zstandard, Amazon Redshift also offers compression encoding, AZ64, for numeric and date/time types.
Concurrency: Amazon Redshift is designed to provide fast performance, whether data is queried in your Amazon Redshift data warehouse or in your Amazon S3 data lake. Amazon Redshift Concurrency Scaling is designed to support many concurrent users and concurrent queries by adding transient capacity in as concurrency increases.
Materialized views: Amazon Redshift materialized views is designed to help you to achieve faster query performance for iterative or predictable analytical workloads such as dashboarding, and queries from Business Intelligence (BI) tools, and extract, transform and load (ETL) data processing jobs. You can use materialized views to store and manage precomputed results of a SELECT statement that may reference one or more tables, including external tables. Subsequent queries referencing the materialized views can reuse the precomputed results. Amazon Redshift is designed to maintain the materialized views incrementally to continue to provide latency performance benefits.
Automated Materialized Views: Automated Materialized Views (AutoMVs) are designed to improve throughput of queries, lower query latency, shorten execution time through automatic refresh, auto query rewrite, incremental refresh, and continuous monitoring of Amazon Redshift clusters. Amazon Redshift is designed to balance the creation and management of AutoMVs with resource utilization.
Machine learning to enhance throughput and performance: ML capabilities in Amazon Redshift can help deliver high throughput and performance. Amazon Redshift uses algorithms to predict and classify incoming queries based on their run times and resource requirements to dynamically manage performance and concurrency. Short query acceleration (SQA) sends short queries from applications such as dashboards to an express queue for processing. Automatic workload management (WLM) uses machine learning to help dynamically manage memory and concurrency. In addition, you can set the priority of your most important queries. Amazon Redshift is designed to be a self-learning system that observes the user workload, determining the opportunities to improve performance as the usage grows, applying optimizations, and making recommendations through Redshift Advisor when an explicit user action is needed.
Result caching: Amazon Redshift is designed to use result caching to deliver fast response times for repeat queries. When a query runs, Amazon Redshift is designed to search the cache to see if there is a cached result from a prior run. If a cached result is found and the data has not changed, the cached result is returned instead of re-running the query.
Data warehousing at scale: Amazon Redshift is designed to be simple and quickly scale as your needs change. Through the console or with a simple API call, you can change the number or type of nodes in your data warehouse, and scale up or down. You can also run queries against large amounts of data in Amazon S3 without having to load or transform any data with the Redshift Spectrum feature. You can use Amazon S3 as a highly available, secure, and effective data lake to store data in open data formats. Amazon Redshift Spectrum is designed to run queries across thousands of parallelized nodes to help deliver fast results.
Flexible pricing options: You can optimize how you pay for Amazon Redshift. You can start small for just cents per hour with no commitments, and scale out to terabytes per year. Amazon Redshift offers on-demand pricing, Reserved Instance pricing, and per-query pricing. Amazon Redshift’s pricing includes security, data compression, backup storage, and data transfer features. As the size of data grows, you can use managed storage in the RA3 instances to store data.
Predictable cost, even with unpredictable workloads: Amazon Redshift is designed to help you scale with minimal cost impact, as each cluster earns Concurrency Scaling credits. This provides you with the ability to predict your month-to-month cost, even during periods of fluctuating analytical demand.
Choose your node type to get the best value for your workloads: You can select from three instance types to optimize Amazon Redshift for your data warehousing needs: RA3 nodes, Dense Compute nodes, and Dense Storage nodes.
RA3 nodes are designed to let you scale storage independently of compute. With RA3, you get a data warehouse that stores data in a separate storage layer. You only need to size the data warehouse for the query performance that you need.
Dense Compute (DC) nodes are designed to allow you to create data warehouses using fast CPUs, large amounts of RAM, and solid-state disks (SSDs), and are a recommended choice for less than 500 GB of data.
Dense Storage (DS2) nodes are designed to let you create data warehouses using hard disk drives (HDDs).
Scaling your cluster or switching between node types can be done with an API call or in the AWS Management Console.
Security and compliance
Amazon Redshift Serverless: Amazon Redshift Serverless is a serverless option of Amazon Redshift that is designed to make it easier to run and scale analytics without the need to set up and manage data warehouse infrastructure. With Redshift Serverless, users—including data analysts, developers, business professionals, and data scientists—can load and query data in the data warehouse.
Query Editor v2: You can use SQL to make your Amazon Redshift data and data lake more accessible to data analysts, data engineers, and other SQL users with a web-based analyst workbench for data exploration and analysis. Query Editor v2 lets you visualize query results in a single click, create schemas and tables, load data visually, and browse database objects. It also provides an editor for authoring and sharing SQL queries, analyses, visualizations, and annotations, and sharing them with your team.
Table Design: Amazon Redshift is designed to monitor user workloads and use sophisticated algorithms for the physical layout of data to optimize query speeds. Automatic Table Optimization is designed to select the best sort and distribution keys to optimize performance for the cluster’s workload. If Amazon Redshift determines that applying a key will improve cluster performance, it is designed to alter tables without requiring administrator intervention. Additional features like Automatic Vacuum Delete, Automatic Table Sort, and Automatic Analyze are designed to eliminate the need for manual maintenance and tuning of Redshift clusters.
Query using your own tools: Amazon Redshift is designed to give you the ability to run queries within the console or connect SQL client tools, libraries, or data science tools including Amazon QuickSight, Tableau, PowerBI, QueryBook and Jupyter Notebook.
API to interact with Amazon Redshift: Amazon Redshift is designed to enable you to access data with many types of traditional, cloud-native, and containerized, serverless web services-based applications and event-driven applications. The Amazon Redshift Data API can help simplify data access, ingest, and egress from programming languages and platforms supported by the AWS SDK such as Python, Go, Java, Node.js, PHP, Ruby, and C++. The Data API helps eliminate the need for configuring drivers and managing database connections. Instead, you can run SQL commands to an Amazon Redshift cluster by calling a secured API endpoint provided by the Data API. The Data API takes care of managing database connections and buffering data. The Data API is asynchronous, so you can retrieve your results later. Your query results are stored for 24 hours.
Fault tolerant: There are multiple features that are designed to enhance the reliability of your data warehouse cluster. For example, Amazon Redshift is designed to continuously monitor the health of the cluster, and re-replicates data from failed drives and replaces nodes as necessary for fault tolerance. Clusters can also be relocated to alternative Availability Zones (AZs).
Granular access controls: Granular row and column level security controls are designed so that users see only the data they should have access to. Amazon Redshift is integrated with AWS Lake Formation so that Lake Formation’s column level access controls are also enforced for Redshift queries on the data in the data lake.
Amazon Redshift data sharing supports centralized access control with AWS Lake Formation to support governance of data shared from Amazon Redshift. AWS Lake Formation (LF) is a service that helps you set up data lakes, centrally manage granular access to data across all consuming services, and apply row level and column level controls.
Dynamic Data Masking: Dynamic Data Masking is designed so customers can protect their sensitive data by limiting how much identifiable data is visible to users. Customers can define multiple levels of permissions on these fields so different users and groups can have varying levels of data access without having to create multiple copies of data, all through Amazon Redshift's SQL interface.
Multi AZ: The new Amazon Redshift Multi-AZ configuration is designed to reduce recovery time and ensure capacity to automatically recover with no data loss. A Redshift Multi-AZ data warehouse is designed to deliver high availability without having to use standby resources.
End-to-end encryption: You can set up Amazon Redshift to use SSL to secure data in transit, and hardware-accelerated AES-256 encryption for data at rest. If you choose to enable encryption of data at rest, all data written to disk will be encrypted as well as any backups. Amazon Redshift is designed to take care of key management by default
Network isolation: Amazon Redshift enables you to configure firewall rules to control network access to your data warehouse cluster. You can run Amazon Redshift inside Amazon Virtual Private Cloud (VPC) to isolate your data warehouse cluster in your own virtual network and connect it to your existing IT infrastructure using encrypted IPsec VPN.
Audit and compliance: Amazon Redshift integrates with AWS CloudTrail to enable you to audit your Redshift API calls. Redshift is designed to log all SQL operations, including connection attempts, queries, and changes to your data warehouse. You can access these logs using SQL queries against system tables, or choose to save the logs to Amazon S3.
Tokenization: Amazon Lambda user-defined functions (UDFs) enable you to use an AWS Lambda function as a UDF in Amazon Redshift and invoke it from Redshift SQL queries. You can write Lambda UDFs to enable external tokenization, data masking, identification or de-identification of data by integrating with third-party vendors.
Amazon Redshift: AWS Services Integration
Whether your data is stored in operational data stores, data lakes, streaming data services or third party datasets, Amazon Redshift is designed to help you access, combine, share data, and run analytics with minimal data movement or copying. Amazon Redshift is integrated with AWS database, analytics, and machine learning (ML) services using zero-ETL approaches to help you access data in place for near-real time analytics, build ML models in SQL, and activate Apache Spark analytics using data in Amazon Redshift. You can reduce data management activities like building complex extract, transform, and load (ETL) pipelines. You can access data in place without data movement or copying, and ingest data into the warehouse for analytics.
Unified analytics
You can reduce your data silos with federated querying, which accesses your data in place from operational databases, data lakes, and data warehouses. Enable your organizations across accounts and Regions to work on shared, transactionally consistent data without the need for data movement or data copying. Find, subscribe to, and query third-party datasets with zero ETL. Connect this data to a BI tool like Amazon QuickSight or to your application using data APIs for dashboarding, line of business analysis, and business decision making.
Data ingestion
You can run near-real-time analytics on your transactional data with Amazon Aurora zero-ETL integration with Amazon Redshift, which is designed to make data available in the warehouse for analytics within seconds of it being written into Amazon Aurora. Support for autocopy allows file ingestion from Amazon Simple Storage Service (S3). Redshift Streaming Ingestion capabilities are designed to allow you to ingest any amount of streaming data with high throughput and low latency.
Analytics and built-in ML
Developers can run Apache Spark applications directly on Amazon Redshift data from AWS Analytics services such as Amazon EMR and AWS Glue. Amazon Redshift integration for Apache Spark expands the data warehouse for a broader set of analytics. With Amazon Redshift ML, you can run billions of predictions with simple SQL commands with native integration into Amazon SageMaker. Amazon Redshift’s integration with Amazon Forecast helps you conduct ML forecasting using SQL.
Amazon Redshift Concurrency Scaling
Analytics workloads can be highly unpredictable resulting in slower query performance and users competing for resources.
The Concurrency Scaling feature is designed to support thousands of concurrent users and concurrent queries, with consistently fast query performance. As concurrency increases, Amazon Redshift adds query processing power to process queries. Once the workload demand subsides, this extra processing power is removed.
Concurrency Scaling is designed to help you:
1. Get consistently fast performance for thousands of concurrent queries and users.
2. Allocate the clusters to specific user groups and workloads, and control the number of clusters that can be used.
3. Continue to use your existing applications and Business Intelligence tools.
Amazon Redshift Data Sharing
Amazon Redshift data sharing is designed to help you extend the benefits of Amazon Redshift to multi-cluster deployments while being able to share data. Data sharing enables granular and fast data access across Amazon Redshift clusters without the need to copy or move it. Data sharing is designed to provide live access to data so that your users can see information as it’s updated in the data warehouse. You can share live data with Amazon Redshift clusters in the same or different AWS accounts and across Regions.
Amazon Redshift data sharing is designed to provide:
- A simple and direct way to share data across Amazon Redshift data warehouses
- Fast, granular, and high performance access without data copies and data movement.
- Live and transactionally consistent views of data across all consumers.
- Secure and governed collaboration within and across organizations and external parties.
Data sharing builds on Amazon Redshift RA3 managed storage, which is designed to decouple storage and compute, allowing either of them to scale independently. With data sharing, workloads accessing shared data are isolated from each other. Queries accessing shared data run on the consumer cluster and read data from the Amazon Redshift managed storage layer directly without impacting the performance of the producer cluster. Workloads accessing shared data can be provisioned with flexible compute resources that meet their workload-specific requirements and be scaled independently as needed in a self-service fashion.
AWS Data Exchange for Amazon Redshift
AWS Data Exchange for Amazon Redshift enables you to find and subscribe to third-party data in AWS Data Exchange that you can query in an Amazon Redshift data warehouse. You can also license your data in Amazon Redshift through AWS Data Exchange: Access is granted when a customer subscribes to your data and revoked when their subscription ends, invoices are generated, and payments are collected and disbursed through AWS. This feature empowers you to query, analyze, and build applications with third-party data.
Multiple data types on a single subscription
AWS Data Exchange subscribers can access data in Amazon Redshift and Amazon S3 files with a single subscription.
Reduce heavy lifting
Access to your Amazon Redshift data is granted when a subscription starts and removed when a subscription ends, invoices are generated when a payment is due, and payments are centrally collected and disbursed through AWS.
Improved business intelligence
You can know what tables your subscribers are using to help you know what data they value most. For privacy reasons, the queries they execute and the data they return are not shared.
Amazon Redshift Serverless
Amazon Redshift Serverless is designed to make it easier to run and scale analytics without having to manage data warehouse infrastructure. Users such as developers, data scientists, and analysts can work across databases, data warehouses, and data lakes to build reporting and dashboarding applications, perform analytics, share and collaborate on data, and build and train machine learning (ML) models. Amazon Redshift Serverless is designed to provision and scale data warehouse capacity to deliver fast performance for all workloads.
Get insights from data
Amazon Redshift Serverless is designed to help you focus on obtaining insights by getting starting quickly and running real-time or predictive analytics on all your data without managing data warehouse infrastructure.
Performance
Amazon Redshift Serverless is designed to scale data warehouse capacity up or down to deliver fast performance for all workloads.
Manage costs and budget
You can pay on a per-second basis. You can set your spend limit and manage your budget with granular spend controls.
Get started quickly
Amazon Redshift Serverless is designed to allow you to load data and get started with your favorite BI tool.
Amazon Redshift Streaming Ingestion
Natively integrating with Amazon streaming engines, Amazon Redshift Streaming Ingestion ingests hundreds of megabytes of data per second so you can query data quickly. With Amazon Redshift Streaming Ingestion, you can connect to multiple Amazon Kinesis Data Streams or Amazon Managed Streaming for Apache Kafka (MSK) data streams and pull data directly to Amazon Redshift without staging data in Amazon Simple Storage Service (S3). Define a scheme or choose to ingest semi-structured data with SUPER data type; set up and manage extract, load, and transform (ELT) pipelines with SQL.
High throughput, low latency
Process large volumes of streaming data from multiple sources with low latency and high throughput to derive insights quickly.
Direct ingestion process
Directly ingest streaming data into your data warehouse from Kinesis Data Streams and MSK without the need to stage in Amazon S3.
Get started managing downstream processing
Perform analytics on streaming data within Amazon Redshift using SQL. Define and build materialized views on top of streams directly. Create and manage downstream ELT pipelines by creating MV on MVs, using user-defined functions and stored procedures in Amazon Redshift.
Amazon Redshift Security & Governance
Amazon Redshift supports industry-leading security with built-in identity management and federation for single-sign on (SSO), multi-factor authentication, column-level access control, row-level security, role-based access control, Amazon Virtual Private Cloud (Amazon VPC), and faster cluster resize. You can configure Amazon Redshift to protect data in transit and at rest.
Infrastructure security
You can control network access to your data warehouse cluster through firewall rules. Using Amazon Virtual Private Cloud (VPC), you can isolate your Redshift data warehouse cluster in your own virtual network, and connect to your existing IT infrastructure using industry-standard encrypted IPSec VPN without using public IPs or requiring traffic to traverse the Internet. You can keep your data encrypted at rest and in transit.
Audit and compliance
Amazon Redshift integrates with AWS CloudTrail to enable you to audit Redshift API calls. Redshift logs all SQL operations, including connection attempts, queries, and changes to your data warehouse. It enables delivery of audit logs for analysis by minimizing latency while also adding Amazon CloudWatch as a log destination. You can choose to stream audit logs directly to Amazon CloudWatch for real-time monitoring. Amazon Redshift offers tools and security measures that customers can use to evaluate, meet, and demonstrate compliance with applicable legal and regulatory requirements.
Identity Management
You can use AWS Identity and Access Management (IAM) to authenticate requests and improve the security of your resources. Role-based Access Control (RBAC) helps you simplify the management of security privileges in Amazon Redshift and control end user access to data at a broad or granular level based on job role/permission rights and level of data sensitivity. You can also map database users to IAM roles for federated access. You can restrict access to data at row or column level and based on roles with column-level security (CLS) and row-level security (RLS) controls, and you can combine these controls for granular access to data. Dynamic data masking in Amazon Redshift is designed to help you selectively mask personal information data at query time based on job role/permission rights and level of data sensitivity. You can control the data masking policies with SQL commands and restrict different levels of permissions to masked data by applying Amazon Redshift RBAC.
Governance
AWS Lake Formation helps you simplify governance of Amazon Redshift Data Shares, and centrally manage data being shared across your organization. With AWS Lake Formation governing data sharing, you can have visibility and control of data being shared across accounts within your organization. Your data administrators can define policies and execute them across Amazon Redshift Data Shares.
Amazon Redshift Query Editor v2.0
Amazon Redshift Query Editor v2.0 is a web-based analyst workbench designed to help you explore, share, and collaborate on data in SQL through a common interface.
Amazon Redshift Query Editor v2.0 allows you to query your data using SQL and visualize your results using charts and graphs. With Amazon Redshift Query Editor v2.0, you can collaborate by sharing saved queries, results, and analyses.
Amazon Redshift is designed to help simplify organizing, documenting, and sharing multiple SQL queries with support for SQL Notebooks (preview) in Amazon Redshift Query Editor v2.0. The new Notebook interface is designed to enable users to author queries more easily, organizing multiple SQL queries and annotations on a single document. They can also share Notebooks.
Access
Amazon Redshift Query Editor v2.0 is a web-based tool that allows you to query and analyze data without requiring permissions to access the Amazon Redshift console.
Browsing and visualization
Use Amazon Redshift Query Editor v2.0 navigator to browse database objects including tables, views, and stored procedures. Use visual wizards to create tables, functions, and load and unload data.
SQL Notebooks support (preview)
You can use SQL Notebooks support (preview) to organize related queries by saving them together in a folder, or combining them into a single saved query with multiple statements.
Query editor
Amazon Redshift Query Editor v2.0’s query editor can auto-complete commands, run multiple queries, and execute multi-statement queries with multiple results.
Exporting and building charts
Amazon Redshift Query Editor v2.0 is designed to help you analyze and sort data without having to re-run queries, then export results as JSON/CSV, and build charts for visual analysis.
Collaboration
You can use Amazon Redshift Query Editor v2.0’s version management for saved queries to collaborate with other SQL users using a common interface. You can collaborate and share different versions of queries, results, and charts.
Amazon Redshift RA3 instances with managed storage
Snapshots and Recovery Points
Amazon Redshift offers snapshots and recovery points. These can be used to recover an entire cluster or table from a previous point in time.
Cross-Region Copy Snapshots
You can configure Amazon Redshift to automatically copy snapshots for a cluster to another AWS Region. If you store a copy of your snapshots in another AWS Region, you can restore your cluster in the primary AWS Region from recent data.
Amazon Redshift ML
Amazon Redshift ML can help data analysts and database developers to create, train, and apply machine learning models using SQL commands in Amazon Redshift data warehouses. With Redshift ML, you can take advantage of Amazon SageMaker, a managed machine learning service, without learning new tools or languages. You can use SQL statements to create and train Amazon SageMaker machine learning models using your Redshift data.
Because Redshift ML allows you to use standard SQL, this can help you be productive with new use cases for your analytics data. Redshift ML provides integration between Redshift and Amazon SageMaker and enables inference within the Redshift cluster, so you can use predictions generated by ML-based models in queries and applications. There is no need to manage a separate inference model end point, and the training data is secured end-to-end with encryption.
Use ML on your Redshift data using standard SQL
To get started, use the CREATE MODEL SQL command in Redshift and specify training data either as a table or SELECT statement. Redshift ML is designed to then compile and import the trained model inside the Redshift data warehouse and prepare a SQL inference function that can be immediately used in SQL queries. Redshift ML handles all the steps needed to train and deploy a model.
Predictive analytics with Amazon Redshift
With Redshift ML, you can embed predictions like fraud detection, risk scoring, and churn prediction directly in queries and reports. Use the SQL function to apply the ML model to your data in queries, reports, and dashboards.
Bring your own model (BYOM)
Redshift ML supports using BYOM for local or remote inference. You can use a model trained outside of Redshift with Amazon SageMaker for in-database inference local in Amazon Redshift. You can import SageMaker Autopilot and direct Amazon SageMaker trained models for local inference. Alternatively, you can invoke remote custom ML models deployed in remote SageMaker endpoints. You can use any SageMaker ML model that accepts and returns text or CSV for remote inference.
Amazon Redshift Integration for Apache Spark
Amazon Redshift Integration for Apache Spark enables Apache Spark applications to access Amazon Redshift data from AWS analytics services such as Amazon EMR, AWS Glue, and Amazon SageMaker. Using Amazon EMR, AWS Glue, and Amazon SageMaker, you can build Apache Spark applications that read from and write to your Amazon Redshift data warehouse. Amazon Redshift Integration for Apache Spark also uses AWS Identity and Access Management (IAM)–based credentials. Amazon Redshift Integration for Apache Spark, is designed to make it unnecessary for manual setup and maintenance of uncertified versions of third-party connectors. You can start with Apache Spark jobs using data in Amazon Redshift quickly.
Apache Spark analytics with Amazon Redshift data
You can expand the number of data sources that you can use in your analytics and machine learning (ML) applications running in Amazon EMR, AWS Glue, or SageMaker by reading from and writing data to your data warehouse.
Access Amazon Redshift data
You can streamline the process of setting up uncertified connectors and JDBC drivers.
An Amazon certified connector
You can use capabilities such as sort, aggregate, limit, join, and scalar functions so that data relevant to you is moved from the Amazon Redshift data warehouse.
Amazon Redshift Availability and Resiliency
Amazon Redshift is a cloud-based data warehouse that is designed to support recovery capabilities for addressing unforeseen outages and minimizing downtime. Data stored in Redshift Managed Storage (RMS) is backed by Amazon S3. Amazon Redshift also supports automatic backups, automatic remediation of failures, and the ability to relocate a cluster to another Availability Zone (AZ) without changes to applications. Amazon Redshift supports Multi-AZ deployment to operate in multiple AZs simultaneously.
Recovery
Amazon Redshift Multi-AZ deployment is designed to allow your workloads to recover during unforeseen outages, without user intervention as the data warehouse operates in two AZs simultaneously.
Business continuity
A Multi-AZ deployment is designed to split compute resources across two AZs automatically and be accessible through a single endpoint.
Data warehouse performance
Multi-AZ deployment is managed as a single data warehouse and is designed to distribute workload processing across multiple AZs.
Amazon Multi-AZ Deployments
Multi-AZ deployment supports running a Redshift data warehouse in multiple AWS Availability Zones (AZs) simultaneously.
Cluster Relocation
The cluster relocation feature is designed to move a cluster to another AZ in one step without requiring application changes. This feature is available for use on clusters leveraging the RA3 instance family.
Snapshots and Recovery Points
Amazon Redshift offers snapshots and recovery points. These can be used to recover an entire cluster or table from a previous point in time.
Cross-Region Copy Snapshots
You can configure Amazon Redshift to automatically copy snapshots for a cluster to another AWS Region. If you store a copy of your snapshots in another AWS Region, you can restore your cluster in the primary AWS Region from recent data.
Amazon Redshift: AWS Services Integration
Whether your data is stored in operational data stores, data lakes, streaming data services or third party datasets, Amazon Redshift is designed to help you access, combine, share data, and run analytics with minimal data movement or copying. Amazon Redshift is integrated with AWS database, analytics, and machine learning (ML) services using zero-ETL approaches to help you access data in place for near-real time analytics, build ML models in SQL, and activate Apache Spark analytics using data in Amazon Redshift. You can reduce data management activities like building complex extract, transform, and load (ETL) pipelines.
Unified analytics
You can break through data silos with federated querying, which enables you to access your data in place from operational databases, data lakes, and data warehouses. This helps enable your organizations across accounts and Regions to work on data without data movement or data copying. You can find, subscribe to, and query third-party datasets with zero ETL. You can connect this data to BI tools like Amazon QuickSight or to your application using data APIs.
Data ingestion
You can run real-time analytics on your data with Amazon Aurora zero-ETL integration with Amazon Redshift, which is designed to make data available in the warehouse for analytics within seconds of it being written into Amazon Aurora. Support for autocopy improves file ingestion from Amazon Simple Storage Service (S3). Redshift Streaming Ingestion capabilities are designed so you can ingest streaming data with high throughput and low latency.
Analytics and built-in ML
Your developers can run Apache Spark applications directly on Amazon Redshift data from AWS Analytics services, such as Amazon EMR and AWS Glue. Amazon Redshift integration for Apache Spark is designed to support Apache Spark-based applications. With Amazon Redshift ML, you can run predictions with SQL commands with native integration into Amazon SageMaker. Amazon Redshift’s integration with Amazon Forecast assists with conducting ML forecasting using SQL.
Amazon Redshift Price Performance
Amazon Redshift is designed to offer price performance for analytics workloads. You can keep the performance of your data workloads high with Massively Parallel Processing (MPP) architecture, separation of storage and compute, Concurrency Scaling, machine learning led performance improvement techniques like short query acceleration, Auto-Materialized views, vectorized scans, Automatic Workload Manager (Auto WLM), and Automatic Table Optimization (ATO).
Scale linearly
You can scale data and compute up or down. You can use the AWS Nitro System with Auto-Materialized Views, Automatic Table Optimization, Automatic Workload Manager, and Short Query Accelerator.
Cost savings
Amazon Redshift Serverless is designed so your data warehouse capacity automatically scales up or down according to your workload demands, and shuts down during periods of inactivity. With provisioned instances, you can pay for your database by the hour, or with reserved instance pricing.
Self-learning, self-tuning system
Machine learning (ML) features like Vectorized Querying techniques, string data performance enhancements, and Short Query Accelerator are designed to lower query latency for high concurrency analytics workloads and reduce manual intervention. Amazon Redshift’s Automatic Workload Manager uses ML to manage memory and concurrency. Automated Materialized Views rewrites thousands of queries every day.
Additional Information
For additional information about service controls, security features and functionalities, including, as applicable, information about storing, retrieving, modifying, restricting, and deleting data, please see https://docs.aws.amazon.com/index.html. This additional information does not form part of the Documentation for purposes of the AWS Customer Agreement available at http://aws.amazon.com/agreement, or other agreement between you and AWS governing your use of AWS’s services.