FSI Services Spotlight: Featuring Amazon Redshift

In this edition of the Financial Services Industry (FSI) Services Spotlight monthly blog series, we highlight five key considerations of Amazon Redshift: achieving compliance, data protection, isolation of compute environments, automating audits with APIs, and operational access and security respectively. Each of the five areas will include specific guidance, suggested reference architectures, and technical code that can help streamline service approval of Amazon Redshift in your environment. These may need to be adapted to your business, compliance and security requirements

Amazon Redshift is a fast, fully managed, cloud-native and cost-effective data warehouse. It enables fast, simple and cost-effective analysis of customer data using standard SQL and a customer’s existing Business Intelligence (BI) tools. It allows AWS customers to run complex analytic queries against terabytes to petabytes of structured and semi-structured data, using sophisticated query optimization, columnar storage on high-performance storage, and massively parallel query execution. You can run queries across petabytes of data in your Amazon Redshift cluster, and access exabytes of data directly in your S3 data lake with minimal time to insight using Amazon Redshift Spectrum. You can set up a cloud data warehouse in minutes, starting small for just $0.25 per hour, and scaling up to petabytes of data and thousands of concurrent users.

Amazon Redshift is used by a variety of financial services institutions around the world to help them achieve better business outcomes through faster and more accurate data driven decisions. Nasdaq uses a data lake based on Amazon S3 and Amazon Redshift to ingest 70 billion records per day, load market data 5 hours faster, and run Amazon Redshift queries 32 percent faster than the existing solution before Redshift.

Upstox is building out its analytics capabilities using Amazon Redshift for greater insight into customer journeys. Data from across the business—including on its mobile app and website—is stored in Amazon Redshift, with in-house services querying the data for specific views. “We discovered through analyzing data in Amazon Redshift that sign-up processes on our website were slow. We fixed that immediately and conversion rates increased by 5–10 percent,” said Shrini Viswanath, Cofounder and Chief Technology Officer, Upstox.

Achieve compliance with Amazon Redshift

Security is a shared responsibility between AWS and the customer. AWS is responsible for protecting the infrastructure that runs AWS services in the AWS Cloud and also provides you with services that you can use securely. A customer’s side of the shared responsibility when using Amazon Redshift is determined by the sensitivity of the customer’s data, their organization’s compliance and security objectives, and applicable laws and regulations. Amazon Redshift falls under the scope of the following compliance programs and the related documents are available on demand through AWS Artifact. For more information, see AWS Artifact.

Cloud Computing Compliance Controls Catalogue (C5)
ISO 27001:2013 Statement of Applicability (SoA)
ISO 27001:2013 Certification
ISO 27017:2015 Statement of Applicability (SoA)
ISO 27017:2015 Certification
ISO 27018:2015 Statement of Applicability (SoA)
ISO 27018:2014 Certification
ISO 9001:2015 Certification
PCI DSS Attestation of Compliance (AOC) and Responsibility Summary
Service Organization Controls (SOC) 1 Report
Service Organization Controls (SOC) 2 Report
Service Organization Controls (SOC) 2 Report For Confidentiality

Data protection with Amazon Redshift

Encryption At-Rest

To keep customer data secure at rest, Amazon Redshift encrypts each block of the Amazon Redshift storage system using hardware-accelerated AES-256 as it is written to disk, using the default key or customer managed AWS Key Management Service (KMS ) key. This takes place at a low level in the I/O subsystem, which encrypts everything written to disk, including intermediate query results. The blocks are backed up as is, which means that backups/snapshots are encrypted as well. By default, Amazon Redshift handles key management, but customers can choose to use customer managed keys in AWS Key Management Service.

Customers can use this pattern to provide them with an automatic notification when a new Amazon Redshift cluster is created without encryption.

Figure 1 Target Architecture for pattern

Central admins teams can choose to use a variant of the following SCP to limit snapshot actions by users in the specific accounts where redshift is run:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Statement1",
      "Effect": "Deny",
      "Action": [
        "redshift:BatchDeleteClusterSnapshots",
        "redshift:BatchModifyClusterSnapshots",
        "redshift:CopyClusterSnapshot",
        "redshift:DeleteClusterSnapshot",
        "redshift:ModifyClusterSnapshot",
        "redshift:AuthorizeSnapshotAccess",
        "redshift:CreateSnapshotCopyGrant"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}

Encryption In-Transit

AWS applies the following mechanisms to secure the customer’s data in transit between an Amazon Redshift cluster and the customer’s applications, AWS CLI, SDK, API clients as well as other AWS services such as Amazon S3, DynamoDB, and Advanced Query Accelerator (AQUA) for Amazon Redshift.

Amazon Redshift supports Secure Sockets Layer (SSL) connections to encrypt data and uses AWS Certificate Manager (ACM) issued server certificates to validate the server certificate that the client connects to. The client connects to the leader node of an Amazon Redshift cluster. For more information, see Configuring security options for connections. To configure your cluster to require an SSL connection, set the require_SSL parameter to true in the parameter group that is associated with the cluster.
To protect data in transit within the AWS Cloud, Amazon Redshift uses hardware accelerated SSL to communicate with Amazon S3 or Amazon DynamoDB for COPY, UNLOAD, backup, and restore operations.
Data is transmitted between AQUA and Amazon Redshift clusters over a TLS-encrypted channel. This channel is signed according to the Signature Version 4 Signing Process (Sigv4).
Amazon Redshift also provides HTTPS endpoints for encrypting data in transit.

In addition to the above mentioned controls, customers can use AWS Partner Network (APN) products from the AWS Marketplace to protect their Redshift data, see this blog for an example of using a partner product to protect sensitive data in Redshift.

Isolation of compute environments with Amazon Redshift

We recommend customers deploy an Amazon Redshift cluster within a VPC. When a customer provisions an Amazon Redshift cluster in a VPC, it is locked down by default so nobody has access to it. To grant other users inbound access to an Amazon Redshift cluster, you associate the cluster with a security group. For more information on managing an Amazon Redshift cluster in a VPC, see Managing clusters in a VPC.

Customers can connect directly to Amazon Redshift API service using an interface VPC endpoint (AWS PrivateLink) in the VPC to access the Redshift API privately. When customers use an interface VPC endpoint, communication between their VPC and Amazon Redshift is conducted entirely within the AWS network without using an internet gateway, network address translation (NAT) device, virtual private network (VPN) connection, or AWS Direct Connect connection. Each VPC endpoint is represented by one or more elastic network interfaces with private IP addresses in a customer’s VPC subnets. We recommend customers attach VPC endpoint policies to a VPC endpoint to control access for AWS Identity and Access Management (IAM) principals. You can also associate security groups with a VPC endpoint to control inbound and outbound access based on the origin and destination of network traffic.

We recommend customers use use an Amazon Redshift-managed VPC endpoint (powered by AWS PrivateLink) to connect to your private Amazon Redshift cluster with the RA3-instance type within your virtual private cloud (VPC). With an Amazon Redshift-managed VPC endpoint, you can privately access your Amazon Redshift data warehouse within your VPC from your client applications in another VPC within the same AWS account, another AWS account, or running on-premises without using public IPs or requiring encrypted traffic to traverse the internet.

Figure 2 Architecture depicting Amazon Redshift access via Amazon Redshift-managed endpoints

See this blog for more details on how to enable private access to Amazon Redshift from your client applications in another VPC.

Automating audits with APIs with Amazon Redshift

AWS Config monitors the configuration of resources and provides some out of the box rules to alert when resources fall into a non-compliant state. For Amazon Redshift there are seven managed AWS Config rules that you should evaluate and consider turning on:

Amazon Redshift is integrated with AWS CloudTrail, a service that provides a record of actions taken by a user, role, or an AWS service in Amazon Redshift. CloudTrail captures all API calls for Amazon Redshift as events. These include calls from the Amazon Redshift console and from code calls to the Amazon Redshift API operations. Using the information collected by CloudTrail, you can determine the request that was made to Amazon Redshift, the IP address from which the request was made, who made the request, when it was made, and additional details. For more information, see Logging Amazon Redshift API calls with AWS CloudTrail.

There are several APIs that you should monitor for Redshift in your environment (although not limited to just these APIs):

AuthorizeClusterSecurityGroupIngress: Adds an inbound (ingress) rule to an Amazon Redshift security group.
AuthorizeEndpointAccess: Grants access to a cluster.
AuthorizeSnapshotAccess: Authorizes the specified AWS account to restore the specified snapshot.
CreateCluster: Creates a new cluster with the specified parameters.
CreateClusterSecurityGroup: Creates a new Amazon Redshift security group. You use security groups to control access to non-VPC clusters.
GetClusterCredentials: Returns a database user name and temporary password with temporary authorization to log on to an Amazon Redshift database.
CreateEndpointAccess: Creates a Redshift-managed VPC endpoint.
DeleteCluster: Deletes a previously provisioned cluster without its final snapshot being created.
ModifyCluster: Modifies the settings for a cluster.
ModifyEndpointAccess: Modifies a Redshift-managed VPC endpoint.

You can view a complete list of all available Amazon Redshift APIs, here.

The following example shows a CloudTrail log entry for a sample ModifyCluster call made against a Redshift cluster:

{
    "eventVersion": "1.08",
    "userIdentity": {
        "type": "AssumedRole",
        "principalId": "AROAYZ2KHWS3EUNOH2NCG:someuser",
        "arn": "arn:aws:sts::123456789123:assumed-role/AWS-ProServe-Team/someuser",
        "accountId": "123456789123",
        "accessKeyId": "AISAYZ6KHWS3HJEFWPSV",
        "sessionContext": {
            "sessionIssuer": {
                "type": "Role",
                "principalId": "AROAYZ2KHWS3EUNOH2NCG",
                "arn": "arn:aws:iam::123456789123:role/AWS-ProServe-Team",
                "accountId": "123456789123",
                "userName": "AWS-ProServe-Team"
            },
            "webIdFederationData": {},
            "attributes": {
                "mfaAuthenticated": "false",
                "creationDate": "2021-05-20T13:48:10Z"
            }
        }
    },
    "eventTime": "2021-05-20T15:27:11Z",
    "eventSource": "redshift.amazonaws.com",
    "eventName": "ModifyCluster",
    "awsRegion": "us-east-1",
    "sourceIPAddress": "xx.xx.xx.xx",
    "userAgent": "aws-internal/3 aws-sdk-java/1.11.965",
    "requestParameters": {
        "clusterIdentifier": "redshift-test-cluster",
        "masterUserPassword": "****",
        "availabilityZoneRelocation": true
    }

FSI customers can use AWS Audit Manager to continuously audit their AWS usage and simplify how they assess risk and compliance with regulations and industry standards. AWS Audit Manager automates evidence collection and organizes the evidence as defined by the control set in the framework selected such as PCI-DSS, SOC 2, and GDPR. Audit Manager collects data from sources including CloudTrail to compare the environment’s configurations against the compliance controls. By logging all Transcribe calls in CloudTrail, Audit Manager’s integration with CloudTrail becomes advantageous when needing to ensure that controls have been met. Considering the encryption requirement in SOC 2, for example. Rather than querying across all CloudTrail logs to ensure the S3 bucket for Transcribe’s output is encrypted, customers can centrally see whether the requirement is being met in Audit Manager. Audit Manager saves time with automated collection of evidence and provides audit-ready reports for customers to review. The Audit Manager assessment report uses cryptographic verification to help you ensure the integrity of the assessment report.

The following screenshot illustrates the configuration of a custom control for a data source for the Amazon Redshift action of interest.

Following is an example of evidence in Audit Manager from a custom Redshift control. The clusterStatus field shows “creating” which highlights a new Redshift cluster has been created as defined by the custom control.

{
  "clusterIdentifier": "redshift-cluster-2",
  "nodeType": "ra3.4xlarge",
  "clusterStatus": "creating",
  "clusterAvailabilityStatus": "Modifying",
  "masterUsername": "awsuser",
  "dBName": "dev",
  "automatedSnapshotRetentionPeriod": 1,
  "manualSnapshotRetentionPeriod": -1,
  "clusterSecurityGroups": [],
  "vpcSecurityGroups": [
    {
      "vpcSecurityGroupId": "sg-584c4d10",
      "status": "active"
    }
  ],
  "clusterParameterGroups": [
    {
      "parameterGroupName": "default.redshift-1.0",
      "parameterApplyStatus": "in-sync"
    }
  ],
  "clusterSubnetGroupName": "default",
  "vpcId": "vpc-86a4affd",
  "preferredMaintenanceWindow": "sun:00:00-sun:00:30",
  "pendingModifiedValues": {
    "masterUserPassword": "****"
  },
  "clusterVersion": "1.0",
  "allowVersionUpgrade": true,
  "numberOfNodes": 2,
  "publiclyAccessible": false,
  "encrypted": false,
  "tags": [],
  "enhancedVpcRouting": false,
  "iamRoles": [],
  "maintenanceTrackName": "current",
  "deferredMaintenanceWindows": [],
  "nextMaintenanceWindowStartTime": "Jun 20, 2021 12:00:00 AM",
  "aquaConfiguration": {
    "aquaStatus": "disabled",
    "aquaConfigurationStatus": "auto"
  }
}

Operational access and security control with Amazon Redshift

Security, Availability and Confidentiality commitments to customers are documented and communicated in Service Level Agreements (SLAs) and other customer agreements, as well as in the Report on the Amazon Web Services System Relevant to Security, Availability, and Confidentiality. Security, Availability and Confidentiality commitments are standardized and include, but are not limited to, the following:

Appropriately restrict unauthorized internal and external access to data and one customer’s data is appropriately segregated from other customers.
Safeguard data from within and outside of the boundaries of AWS environments which store a customer’s content to meet the service commitments.

Amazon Redshift logs information about connections and user activities in the database. These logs help you to monitor the database for security and troubleshooting purposes, which is a process often referred to as database auditing. The logs are stored in Amazon S3 buckets. These provide convenient access with data security features for users who are responsible for monitoring activities in the database. Amazon Redshift logs information in the following log files:

Connection log — logs authentication attempts, and connections and disconnections.
User log — logs information about changes to database user definitions.
User activity log — logs each query before it is run on the database.

The connection and user logs are useful primarily for security purposes. You can use the connection log to monitor information about the users who are connecting to the database and the related connection information. This information might be their IP address, when they made the request, what type of authentication they used, and so on. You can use the user log to monitor changes to the definitions of database users. The Redshift user activity log is useful primarily for troubleshooting purposes. It tracks information about the types of queries that both the users and the system perform in the database. Audit logging is not enabled by default. To enable audit logging, you will have to turn the feature on through the Redshift cluster properties. For more detailed information, see Redshift database audit logging.

Conclusion

In this post we reviewed Amazon Redshift and highlighted key information that can help FSI customers accelerate the approval of Redshift within these five categories: achieving compliance, data protection, isolation of compute environments, automating audits with APIs, and operational access and security. While not a one-size-fits-all approach, the guidance provided can be adapted to meet your organization’s security and compliance requirements and provide a consolidated list of key areas to focus on for Amazon Redshift. In the meantime, be sure to visit AWS FSI Services Spotlight Blog Series and stay tuned for more financial services news and best practices. For more information, visit the AWS Financial Services page and AWS Compliance Center and read the Security Pillar – AWS Well-Architected Framework whitepaper.

AWS for Industries